This project involves writing a parser-generator for Parsing Expression Grammars, suitable for incorporation into the BSD base system.

Last Significant Update:

2024-10-09

Status:

Draft

Comments to:

elftoolchain@jkoshy.net

Currently most BSD operating systems ship with BYacc, a parser-generator for LALR(1) grammars.  When compared to LALR grammars, PEG grammars can be more readable and easier to write.  It seems worth exploring whether we should have a PEG parser generator in the BSD toolkit.

The requirement that the parser-generator is suitable for incorporation into the BSD base system means that:

  1. The parser-generator should be usable during the bootstrapping phase of the OS build — for example, during the NetBSD cross-build process.

  2. The parser-generator should be written in a programming language that is available during the bootstrapping phase of the build: ANSI C99 would be a good choice.

  3. The source code for the parser generator itself needs to be BSD-licensed.

Additionally, the parser generator should support good error recovery.  Please see Medeiros, Alvez & Mascarenhas, 2020 for recent research in this area.

Existing BSD-licensed PEG parser generators.

  • peg/leg — recursive-descent parser generators for C

    A well-documented, BSD-licensed parser generator.

    These parser generators are specified using their own grammar, which means that they need to be bootstrapped from code generated by a prior run of these generators.

  • PeppaPEG

    An ANSI-C BSD-licensed parser library.  The PEG grammar to be used needs to be constructed using C library calls; i.e., this toolkit does not seem to generate standalone parsers.

  • TheLartians/PEGParser

    A C++-17 PEG parser generator supporting memoization, left-recursion and context-dependent grammars.  Grammars are however specified directly in C++; this toolkit does not seem to generate standalone parsers.

Resources