On this page:
1.1 Getting started with parsers
1.2 Parsing textual data
1.3 Sequencing parsers
8.12

1 Parsing Basics🔗ℹ

Megaparsack is a library for manipulating parsers, which are, very simply, functions that operate on streams of tokens. This is very broad: the tokens in question can simply be characters in a string, they can be tokens produced as the result of a lexer, they can be syntax objects, or they can even be completely arbitrary data.

What’s special about parsers is that they can be sequenced—that is, multiple parsers can be chained together to make a larger parser. For example, to make a parser that parses the string "ab", you might compose two parsers that parse the characters #\a and #\b individually.

1.1 Getting started with parsers🔗ℹ

To get started, require the megaparsack and megaparsack/text libraries.

> (require megaparsack megaparsack/text)

This will import the basic parser functions, as well as some built-in parsers for parsing textual data. Now, you can use the parse-string function along with basic parsers to parse values from strings. Let’s start by parsing an integer:

> (parse-string integer/p "42")

(success 42)

Since the parser was successful, it returns a success value. The parse-string function returns an either value that represents success and failure. For example, take a look at what would happen when a parse fails:

> (parse-string integer/p "not an integer")

(failure (message (srcloc 'string 1 0 1 1) #\n '("integer")))

When a parse fails, it returns a failure value that encodes some information about what caused the parser to error. You can convert that information to a human-readable error message using the parse-error->string function:

> (map-failure parse-error->string (parse-string integer/p "not an integer"))

(failure "string:1:0: parse error\n  unexpected: n\n  expected: integer")

You can also assert that a parse will succeed and just get the result out by using the parse-result! function, which will throw an exn:fail:read:megaparsack exception when the parser fails.

> (parse-result! (parse-string integer/p "42"))

42

> (parse-result! (parse-string integer/p "not an integer"))

string:1:0: parse error

  unexpected: n

  expected: integer

You may notice that the error message includes some useful information. Specifically, megaparsack will attempt to provide the following information to the user whenever a parse fails:

In the above case, the parser reports that it expected an integer, but it encountered the character n, which obviously isn’t a valid piece of an integer.

1.2 Parsing textual data🔗ℹ

The integer/p parser, as would be expected, parses a single integer. However, this isn’t very useful on its own—most of the time, you will want to parse something much more complicated than that. However, it is a useful building block for creating larger parsers. Let’s look at some other “building block” parsers that work with strings.

The letter/p, digit/p, and space/p parsers parse a single letter, digit, or whitespace character, respectively:

Note that these parsers succeed even when only part of the input is consumed. This is important when combining parsers together, but if you want to ensure a parser parses the entire input, you can use eof/p.

> (parse-string letter/p "hello")

(success #\h)

> (parse-string digit/p "123")

(success #\1)

> (parse-string space/p " ")

(success #\space)

The char/p function creates a parser that parses a single character:

> (parse-string (char/p #\a) "abc")

(success #\a)

> (parse-result! (parse-string (char/p #\a) "xyz"))

string:1:0: parse error

  unexpected: x

  expected: 'a'

It may not make very much sense why the char/p parser is useful—after all, it just seems to return itself. Indeed, in these contrived examples, it’s not very useful at all! However, it becomes extremely important when combining multiple parsers together.

1.3 Sequencing parsers🔗ℹ

All parsers are monads, so it’s possible to use chain and do from data/monad to combine multiple parsers together to create a bigger parser. For example, let’s create a parser that parses the letters a and b in sequence:

> (require data/monad)
> (define ab/p
    (do (char/p #\a)
        (char/p #\b)))

Now we can use our new parser just like any other:

> (parse-string ab/p "ab")

(success #\b)

> (parse-result! (parse-string ab/p "ac"))

string:1:1: parse error

  unexpected: c

  expected: 'b'

The parser succeeds when we supply it with the string "ab", but it fails when it doesn’t match, and we automatically get a pretty good error message.

One thing to note is that the result of the parser is not particularly meaningful—it’s just #\b. That’s because the last parser in the do block was (char/p #\b), so the result of ab/p is just the result of its final parser. If we wanted to, we could change the result to be whatever we wanted (but only on a successful parse) by returning our own value at the end of the do block:

> (define ab*/p
    (do (char/p #\a)
        (char/p #\b)
        (pure "success!")))

We need the pure wrapper in order to properly “lift” our arbitrary value into the context of a parser. Now we can run our new parser and get back our custom value when it succeeds:

> (parse-string ab*/p "ab")

(success "success!")

This parser is a little silly, but we can use these concepts to implement parsers that might actually be useful. For example, you might need to parser two integers, separated by a comma, then add them together. Using the monadic parser interface, this is extremely simple:

> (define add-two-ints/p
    (do [x <- integer/p]
        (char/p #\,)
        [y <- integer/p]
        (pure (+ x y))))

This definition is a little bit more complicated because we are using the results of the two integer parsers in our sequence, so we use the [a <- b] syntax to “pull out” the result of each parser and bind it to a variable. Then we can add the two results together at the end. Actually using this parser works just as intended:

> (parse-string add-two-ints/p "7,12")

(success 19)

Using this technique, it’s possible to build up fairly complex parsers from small, self-contained units.