To create a parser with multiple possibilities, use the or/p combinator. It accepts any number of parsers, and it tries them one at a time until one of them matches. For example, let’s consider a parser that parses either the string "true" or "false", then returns the value as a Racket boolean:
> (define true/p (do (string/p "true") (pure #t)))
> (define false/p (do (string/p "false") (pure #f)))
> (define boolean/p (or/p true/p false/p))
By using or/p, we’ve created a choice point, where the parser will try each path before giving up. If none of the paths match, the parser will still fail, but if any of the paths match, it will consider the parse successful and return. To demonstrate that this works, we can use our new parser on some strings:
> (parse-string boolean/p "true")
> (parse-string boolean/p "false")
The or/p combinator also automatically cooperates with error handling to provide helpful error messages when parsing fails:
> (parse-result! (parse-string boolean/p "not a boolean"))
string:1:0: parse error
expected: false or true
Note that the error includes all the possible values that would have been considered valid at the point that the parser failed.
Remember that the or/p combinator is not magic: it does not attempt to predict which parse will be valid, and it does not even try to look ahead to see which parse will be the longest. This can cause problems when two different parses could both succeed—or/p will just pick the first one:
> (define overlapping/p (or/p (string/p "hello") (string/p "hello, world!"))) > (parse-string overlapping/p "hello, world!")
So, if or/p does not perform any lookahead, how exactly does it choose between parsers? It might seem like it tries each parser completely, then backtracks when any of them fail, but this is not entirely true—consider the parser above, fixed so the longest match is first:
You might expect that, if the first match fails, it will try the second one, but in practice, this doesn’t actually work:
> (parse-string overlapping/p "hello, world!")
(success "hello, world!")
> (parse-result! (parse-string overlapping/p "hello"))
string:1:4: parse error
unexpected: end of input
expected: , world!
What gives? Take a close look at the error message: it is expecting the rest of hello, world!, but obviously we only gave it hello. Why isn’t the parser backtracking? Well, megaparsack actually does not backtrack by default. Instead, it implements a single-character lookahead: it tries to parse the first token from each branch, and if it succeeds, it commits to that path.
This means that, since part of the hello, world parse was successful, the parser has already committed to that branch and will not try any of the other options. This turns out to provide far superior error reporting because it reports to the user precisely where the error occurred, not somewhere much earlier in the parse. However, this obviously causes problems in this case where the parse is ambiguous, or more generally, the choice cannot be determined by a single character of lookahead.
To solve this by allowing the parser to backtrack, use the try/p combinator, which converts a parser into one that backtracks upon failure. We can use this to solve our issue with our parser:
> (define backtracking-overlapping/p (or/p (try/p (string/p "hello, world!")) (string/p "hello"))) > (parse-string backtracking-overlapping/p "hello, world!")
(success "hello, world!")
> (parse-string backtracking-overlapping/p "hello")
All that try/p does is disable the “committing” behavior mentioned earlier: instead of committing to a particular path once any of the parse succeeds, any error that occurs within the parser provided to try/p is non-fatal, and the parser will backtrack and try the next alternative.
In this case, since the parse is truly ambiguous based on the first character, try/p is the correct approach. Note that the error messages are still helpful upon failure:
> (parse-result! (parse-string backtracking-overlapping/p "not hello"))
string:1:0: parse error
expected: hello or hello, world!
However, be deliberate about where you put try/p because it is very easy to end up with a parser that provides completely useless error messages because all errors simply backtrack instead of failing fast and reporting the real problem. For an example of this, consider a parser that parses an integer or a boolean, depending on a label provided first:
> (define the-labeled-integer/p (do (string/p "the integer: ") integer/p))
> (define the-labeled-boolean/p (do (string/p "the boolean: ") boolean/p))
It might be tempting to use try/p here because we know that the integer case might fail. Therefore, you might write the parser like this:
This parser seems innocuous enough, right? It even works successfully:
> (parse-string try-labeled/p "the integer: 42")
> (parse-string try-labeled/p "the boolean: false")
But there is a lurking problem with this parser, and that’s its error messages. Consider a mismatch, when we provide the the integer: label but do not actually provide an integer:
> (parse-result! (parse-string try-labeled/p "the integer: false"))
string:1:3: parse error
Oops. What happened? Well, the parser tried to parse an integer, but it failed, so it backtracked. It then tried to parse a boolean, and it parsed the the, but then it failed, too, so it reported an error message. To a user, though, that error message is totally useless. The actual issue is that they should have provided an integer, but instead provided a boolean. Unfortunately, the overzealous backtracking has eliminated that information.
This is tricky, because we can’t just drop the try/p—since both cases share the, the parse is ambiguous without a little bit of lookahead. In order to fix this, what we really want to do is factor out the common the, which will allow us to eliminate the try/p altogether:
> (define labeled-integer/p (do (string/p "integer: ") integer/p))
> (define labeled-boolean/p (do (string/p "boolean: ") boolean/p))
> (define labeled/p (do (string/p "the ") (or/p labeled-integer/p labeled-boolean/p)))
Since we’ve removed all of the uses of try/p, now the parser can provide much more precise error messages when we provide invalid input.
> (parse-result! (parse-string labeled/p "the integer: false"))
string:1:13: parse error