A PEG Parser Generator for the Base System
This project involves writing a parser-generator for Parsing Expression Grammars, suitable for incorporation into the BSD base system.
Last Significant Update: |
2024-10-09 |
Status: |
Draft |
Comments to: |
Currently most BSD operating systems ship with BYacc, a parser-generator for LALR(1) grammars. When compared to LALR grammars, PEG grammars can be more readable and easier to write. It seems worth exploring whether we should have a PEG parser generator in the BSD toolkit.
The requirement that the parser-generator is suitable for incorporation into the BSD base system means that:
-
The parser-generator should be usable during the bootstrapping phase of the OS build — for example, during the NetBSD cross-build process.
-
The parser-generator should be written in a programming language that is available during the bootstrapping phase of the build: ANSI C99 would be a good choice.
-
The source code for the parser generator itself needs to be BSD-licensed.
Additionally, the parser generator should support good error recovery. Please see Medeiros, Alvez & Mascarenhas, 2020 for recent research in this area.
Related Work
Existing BSD-licensed PEG parser generators.
-
peg/leg — recursive-descent parser generators for C
A well-documented, BSD-licensed parser generator.
These parser generators are specified using their own grammar, which means that they need to be bootstrapped from code generated by a prior run of these generators.
-
An ANSI-C BSD-licensed parser library. The PEG grammar to be used needs to be constructed using
C
library calls; i.e., this toolkit does not seem to generate standalone parsers. -
A C++-17 PEG parser generator supporting memoization, left-recursion and context-dependent grammars. Grammars are however specified directly in C++; this toolkit does not seem to generate standalone parsers.
Resources
-
The Packrat Parsing Expression Grammars Page, Bryan Ford. Resources related to PEG parsing.
-
Automatic syntax error reporting and recovery in parsing expression grammars, Sérgio Queiroz de Medeiros, Gilney de Azevedo Alvez Junior, Fabio Mascarenhas, Science of Computer Programming, Volume 187, February 2020.