Need help with dart-petitparser?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

petitparser
278 Stars 45 Forks MIT License 974 Commits 7 Opened issues

Description

Dynamic parser combinators in Dart.

Services available

!
?

Need anything else?

Contributors list

PetitParser for Dart

Pub Package Build Status GitHub Issues GitHub Forks GitHub Stars GitHub License

Grammars for programming languages are traditionally specified statically. They are hard to compose and reuse due to ambiguities that inevitably arise. PetitParser combines ideas from scannnerless parsing, parser combinators, parsing expression grammars (PEG) and packrat parsers to model grammars and parsers as objects that can be reconfigured dynamically.

This library is open source, stable and well tested. Development happens on GitHub. Feel free to report issues or create a pull-request there. General questions are best asked on StackOverflow.

The package is hosted on dart packages. Up-to-date API documentation is created with every release.

Tutorial

Below are step-by-step instructions of how to write your first parser. More elaborate examples (JSON parser, LISP parser and evaluator, Prolog parser and evaluator, etc.) are included in the example repository.

Installation

Follow the installation instructions on dart packages.

Import the package into your Dart code using:

import 'package:petitparser/petitparser.dart';

Writing a Simple Grammar

Writing grammars with PetitParser is as simple as writing Dart code. For example, the following code creates a parser that can read identifiers (a letter followed by zero or more letter or digits):

final id = letter() & (letter() | digit()).star();

If you inspect the object

id
in the debugger, you'll notice that the code above builds a tree of parser objects:
  • SequenceParser: This parser accepts a sequence of parsers.
    • CharacterParser: This parser accepts a single letter.
    • PossessiveRepeatingParser: This parser accepts zero or more times another parser.
    • ChoiceParser: This parser accepts a single word character.
      • CharacterParser: This parser accepts a single letter.
      • CharacterParser: This parser accepts a single digit.

The operators

&
and
|
are overloaded and create a sequence and a choice parser respectively. In some contexts it might be more convenient to use chained function calls, or the extension methods on lists. Both of the following parsers accept the same inputs as the parser above:
final id1 = letter().seq(letter().or(digit()).star());
final id2 = [letter(), [letter(), digit()].toChoiceParser().star()].toSequenceParser();

Note that the inferred type of the 3 parsers is not equivalent: Due to github.com/dart-lang/language/issues/1557 the inferred type of sequence and choice parsers created with operators or chained function calls is

Parser
. The last variation created from lists has more specific type information.

Parsing Some Input

To actually consume an input string we use the method

Parser.parse
:
final result1 = id.parse('yeah');
final result2 = id.parse('f12');

The method

Parser.parse
returns a
Result
, which is either an instance of
Success
or
Failure
. In both examples we are successful and can retrieve the resulting value using
Success.value
:
print(result1.value);                   // ['y', ['e', 'a', 'h']]
print(result2.value);                   // ['f', ['1', '2']]

While it seems odd to get these nested arrays with characters as a return value, this is the default decomposition of the input into a parse-tree. We'll see in a while how that can be customized.

If we try to parse something invalid we get an instance of

Failure
and we can retrieve a descriptive error message using
Failure.message
:
final result3 = id.parse('123');
print(result3.message);                 // 'letter expected'
print(result3.position);                // 0

Trying to retrieve result by calling

Failure.value
would throw the exception
ParserError
.
Context.isSuccess
and
Context.isFailure
can be used to decide if the parsing was successful.

If you are only interested if a given string is valid you can use the helper method

Parser.accept
:
print(id.accept('foo'));                // true
print(id.accept('123'));                // false

Different Kinds of Parsers

PetitParser provides a large set of ready-made parser that you can compose to consume and transform arbitrarily complex languages. Terminal parsers are the simplest. We've already seen a few of those:

  • any()
    parses any character.
  • char('a')
    (or
    'a'.toParser()
    ) parses the character a.
  • digit()
    parses any digit from 0 to 9.
  • letter()
    parses any letter from a to z and A to Z.
  • pattern('a-f')
    (or
    'a-f'.toParser(isPattern: true)
    ) parsers any character between a and f.
  • patternIgnoreCase('a-f')
    (or
    'a-f'.toParser(isPattern: true, caseInsensitive: true)
    ) parsers and character between a and f, or A and F.
  • string('abc')
    (or
    'abc'.toParser()
    ) parses the string abc.
  • stringIgnoreCase('abc')
    (or
    'abc'.toParser(caseInsensitive: true)
    ) parses the strings Abc, aBC, ...
  • word()
    parses any letter or digit.

So instead of using the letter and digit predicate, we could have written our identifier parser like this:

final id = letter() & word().star();

The next set of parsers are used to combine other parsers together:

  • p1 & p2
    ,
    p1.seq(p2)
    , or
    [p1, p2].toSequenceParser()
    parse p1 followed by p2 (sequence).
  • p1 | p2
    ,
    p1.or(p2)
    , or
    [p1, p2].toChoiceParser()
    parse p1, if that doesn't work parse p2 (ordered choice).
  • p.star()
    parses p zero or more times.
  • p.plus()
    parses p one or more times.
  • p.optional()
    parses p, if possible.
  • p.and()
    parses p, but does not consume its input.
  • p.not()
    parses p and succeed when p fails, but does not consume its input.
  • p.end()
    parses p and succeed at the end of the input.

The last type of parsers are actions or transformations we can use as follows:

  • p.map((value) => ...)
    performs the transformation using the provided callback.
  • p.pick(n)
    returns the n-th element of the list p returns.
  • p.flatten()
    creates a string from the consumed input of p.
  • p.token()
    creates a token from the result of p.
  • p.trim()
    trims whitespaces before and after p.
  • p.cast()
    casts the result of p to the type
    T
    .

To return a string of the parsed identifier, we can modify our parser like this:

final id = (letter() & word().star()).flatten();

To conveniently find all matches in a given input string you can use

Parser.matchesSkipping
:
final matches = id.matchesSkipping('foo 123 bar4');
print(matches);                         // ['foo', 'bar4']

These are the basic elements to build parsers. There are a few more well documented and tested factory methods in the

Parser
class. If you want browse their documentation and tests.

Writing a More Complicated Grammar

Now we are able to write a more complicated grammar for evaluating simple arithmetic expressions. Within a file we start with the grammar for a number (actually an integer):

final number = digit().plus().flatten().trim().map(int.parse);

Then we define the productions for addition and multiplication in order of precedence. Note that we instantiate the productions with undefined parsers upfront, because they recursively refer to each other. Later on we can resolve this recursion by setting their reference:

final term = undefined();
final prod = undefined();
final prim = undefined();

final add = (prod & char('+').trim() & term) .map((values) => values[0] + values[2]); term.set(add | prod);

final mul = (prim & char('*').trim() & prod) .map((values) => values[0] * values[2]); prod.set(mul | prim);

final parens = (char('(').trim() & term & char(')').trim()) .map((values) => values[1]); final number = digit().plus().flatten().trim().map(int.parse); prim.set(parens | number);

To make sure our parser consumes all input we wrap it with the

end()
parser into the start production:
final parser = term.end();

That's it, now we can test our parser and evaluator:

parser.parse('1 + 2 * 3');              // 7
parser.parse('(1 + 2) * 3');            // 9

Using Grammar Definitions

Defining and reusing complex grammars can be cumbersome, particularly if the grammar is large and recursive (such as the example above). The class

GrammarDefinition
provides the building block to conveniently define and build complex grammars with possibly hundreds of productions.

To create a new grammar definition subclass

GrammarDefinition
. In our case we call the class
ExpressionDefinition
. For every production create a new method returning the primitive parser defining it. The method called
start
is supposed to return the start production of the grammar. To refer to a production defined in the same definition use
ref0
with the function reference as the argument. The 0 at the end of
ref0
means that the production reference isn't parametrized (zero argument production method).
class ExpressionDefinition extends GrammarDefinition {
  Parser start() => ref0(term).end();

Parser term() => ref0(add) | ref0(prod); Parser add() => ref0(prod) & char('+').trim() & ref0(term);

Parser prod() => ref0(mul) | ref0(prim); Parser mul() => ref0(prim) & char('*').trim() & ref0(prod);

Parser prim() => ref0(parens) | ref0(number); Parser parens() => char('(').trim() & ref0(term) & char(')').trim();

Parser number() => digit().plus().flatten().trim(); }

To create a parser with all the references correctly resolved call

build()
.
final definition = new ExpressionDefinition();
final parser = definition.build();
parser.parse('1 + 2 * 3');              // ['1', '+', ['2', '+', '3']]

Again, since this is plain Dart, common code refactorings such as renaming a production updates all references correctly. Also code navigation and code completion works as expected.

To attach custom production actions you might want to further subclass your grammar definition and override overriding the necessary productions defined in the superclass:

class EvaluatorDefinition extends ExpressionDefinition {
  Parser add() => super.add().map((values) => values[0] + values[2]);
  Parser mul() => super.mul().map((values) => values[0] * values[2]);
  Parser parens() => super.parens().castList().pick(1);
  Parser number() => super.number().map((value) => int.parse(value));
}

Similarly, build the evaluator parser like so:

final definition = new EvaluatorDefinition();
final parser = definition.build();
parser.parse('1 + 2 * 3');              // 7

To use just a part of the parser you can specify the start production when building. For example, to reuse the number parser one would write:

final definition = new EvaluatorDefinition();
final parser = definition.build(start: definition.number);
parser.parse('42');                     // 42

This is just the surface of what

GrammarDefinition
can do, check out the documentation and the examples using it.

Using the Expression Builder

Writing such expression parsers is pretty common and can be quite tricky to get right. To simplify things, PetitParser comes with a builder that can help you to define such grammars easily. It supports the definition of operator precedence; and prefix, postfix, left- and right-associative operators.

The following code creates the empty expression builder:

final builder = ExpressionBuilder();

Then we define the operator-groups in descending precedence. The highest precedence are the literal numbers themselves. This time we accept floating-point numbers, not just integers. In the same group we add support for the parenthesis:

builder.group()
  ..primitive(digit()
      .plus()
      .seq(char('.').seq(digit().plus()).optional())
      .flatten()
      .trim()
      .map((a) => num.tryParse(a)))
  ..wrapper(char('(').trim(), char(')').trim(), (String l, num a, String r) => a);

Then come the normal arithmetic operators. Note, that the action blocks receive both, the terms and the parsed operator in the order they appear in the parsed input:

// negation is a prefix operator
builder.group()
  ..prefix(char('-').trim(), (String op, num a) => -a);

// power is right-associative builder.group() ..right(char('^').trim(), (num a, String op, num b) => math.pow(a, b));

// multiplication and addition are left-associative builder.group() ..left(char('*').trim(), (num a, String op, num b) => a * b) ..left(char('/').trim(), (num a, String op, num b) => a / b); builder.group() ..left(char('+').trim(), (num a, String op, num b) => a + b) ..left(char('-').trim(), (num a, String op, num b) => a - b);

Finally, we can build the parser:

final parser = builder.build().end();

After executing the above code we get an efficient parser that correctly evaluates expressions like:

parser.parse('-8');                     // -8
parser.parse('1+2*3');                  // 7
parser.parse('1*2+3');                  // 5
parser.parse('8/4/2');                  // 1
parser.parse('2^2^3');                  // 256

Check out the documentation for more examples.

Misc

petitparser.github.io contains up-to-date information about PetitParser.

Examples

The package comes with a large collection of example grammars and language experiments ready to explore:

  • Dart contains an experimental Dart grammar.
  • JSON contains a complete JSON grammar and parser.
  • Lisp contains a complete LISP grammar, parser and evaluator.
  • Prolog contains a basic Prolog grammar, parser and evaluator.
  • Smalltalk contains a complete Smalltalk grammar.
  • Uri contains a simple URI parser.

Furthermore, there are numerous open source projects using PetitParser:

  • apollovm, a simple VM that can parse, run and generate basic Dart and Java8 code.
  • equations is an equation solving library.
  • expression_language is a library for parsing and evaluating expressions.
  • expressions is a library to parse and evaluate simple expressions.
  • intl_translation provides internationalization and localization support to Dart.
  • json_path is an implementation of JSONPath expressions.
  • pem encodes and decodes textual cryptographic keys.
  • puppeteer is a library to automate the Chrome browser.
  • query implements search queries with support for boolean groups, field scopes, ranges, etc.
  • xml is a lightweight library for parsing, traversing, and querying XML documents.

History

PetitParser was originally implemented in Smalltalk. Later on, as a mean to learn these languages, I reimplemented PetitParser in Java and Dart. The implementations are very similar in their API and the supported features. If possible, the implementations adopt best practises of the target language.

Implementations

License

The MIT License, see LICENSE.

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.