parser

by elm-tools

elm-tools / parser

Simple Parser + Nice Error Messages

125 Stars 10 Forks Last release: Not found BSD 3-Clause "New" or "Revised" License 33 Commits 5 Releases

Available items

No Items, yet!

The developer of this repository has not created any items for sale yet. Need a bug fixed? Help with integration? A different license? Create a request here:

Parser + Nice Error Messages

Goals:

  • Make writing parsers as simple and fun as possible.
  • Produce excellent error messages.
  • Go pretty fast.

This is achieved with a couple concepts that I have not seen in any other parser libraries: parser pipelines, tracking context, and delayed commits.

Parser Pipelines

To parse a 2D point like

( 3, 4 )
, you might create a
point
parser like this:
import Parser exposing (Parser, (|.), (|=), succeed, symbol, float, ignore, zeroOrMore)


type alias Point = { x : Float , y : Float }

point : Parser Point point = succeed Point |. symbol "(" |. spaces |= float |. spaces |. symbol "," |. spaces |= float |. spaces |. symbol ")"

spaces : Parser () spaces = ignore zeroOrMore (\c -> c == ' ')

All the interesting stuff is happening in

point
. It uses two operators:
  • (|.)
    means “parse this, but ignore the result”
  • (|=)
    means “parse this, and keep the result”

So the

Point
function only gets the result of the two
float
parsers.

The theory is that

|=
introduces more “visual noise” than
|.
, making it pretty easy to pick out which lines in the pipeline are important.

I recommend having one line per operator in your parser pipeline. If you need multiple lines for some reason, use a

let
or make a helper function.

Tracking Context

Most parsers tell you the row and column of the problem:

Something went wrong at (4:17)

That may be true, but it is not how humans think. It is how text editors think! It would be better to say:

I found a problem with this list:

[ 1, 23zm5, 3 ]
     ^

I wanted an integer, like 6 or 90219.

Notice that the error messages says

this list
. That is context! That is the language my brain speaks, not rows and columns.

This parser package lets you annotate context with the

inContext
function. You can let the parser know “I am trying to parse a

"list"
right now” so if an error happens anywhere in that context, you get the hand annotation!

Note: This technique is used by the parser in the Elm compiler to give more helpful error messages.

Delayed Commits

To make fast parsers with precise error messages, this package lets you control when a parser commits to a certain path.

For example, you are trying to parse the following list:

[ 1, 23zm5, 3 ]

Ideally, you want the error at the

z
, but the libraries I have seen make this difficult to achieve efficiently. You often end up with an error at
[
because “something went wrong”.

This package introduces

delayedCommit
to resolve this.

Say we want to create

intList
, a parser for comma separated lists of integers like
[1, 2, 3]
. We would say something like this:
import Parser exposing (..)


{-| We start by ignoring the opening square brace and some spaces. We only really care about the numbers, so we parse an int and then use intListHelp to start chomping other list entries. -} intList : Parser (List Int) intList = succeed identity |. symbol "[" |. spaces |= andThen (\n -> intListHelp [n]) int |. spaces |. symbol "]"

{-| intListHelp checks if there is a nextInt. If so, it continues trying to find more list items. If not, it gives back the list of integers we have accumulated so far. -} intListHelp : List Int -> Parser (List Int) intListHelp revInts = oneOf [ nextInt |> andThen (\n -> intListHelp (n :: revInts)) , succeed (List.reverse revInts) ]

Now we get to the tricky part! How do we define

nextInt
? Here are two approaches, but only the second one actually works!
-- BAD
badNextInt : Parser Int
badNextInt =
  succeed identity
    |. spaces
    |. symbol ","
    |. spaces
    |= int

-- GOOD nextInt : Parser Int nextInt = delayedCommit spaces

The

badNextInt
looks pretty normal, but it will not work. It commits as soon as the first
spaces
parser succeeds. It fails in the following situation:
[ 1, 2, 3 ]
          ^

When we get to the closing

]
we have already successfully parsed some spaces. That means we are commited to
badNextInt
and need a comma. That fails, so the whole parse fails!

With

nextInt
, the
delayedCommit
function is saying to parse
spaces
but only commit if progress is made after that. So we are only commited to this parser if we see a comma.



Comparison with Prior Work

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.