Racket

Parsec implementation in Racket🔗ℹ

Parsec implementation in Racket. See [parsec].

1 Parsers🔗ℹ

A parser is a function that consumes an input-port? and returns a parse result. A parse result is most commonly a char? or a list of chars, but is ultimately determined by each specific parser.

Two factors are considered when combining parsers:

whether a parser consumes input, i.e., whether data was read from the input port,
or whether a parser succeeds or fails.

These two factors are not mutually exclusive and some combinators only use one of the factors while others use both to determine the parse result.

Specifically, Consumed and Empty struct results indicate input-consumption and no-input-consumption, respectively, while the Ok and Error structs indicate success and failure, respectively. See Parse Result Structs for more details about information contained in the parse result. In general, users should use combinators below to compose parsers and parse results, rather than directly handle the structs.

2 Basic parsing functions🔗ℹ

procedure
(parse-result p input) → any/c
p : parser?
input : (or/c string? path? input-port?)

Parses input input with parser p and returns the parse result.

The input can be either a string, a filepath, or an input port. A string or path input is converted to an input port before parsing begins.

Raises the exn:fail:parsack exception on parser failure.

Examples:

> (parse-result $letter "abc")
#\a
> (parse-result $letter "123")
parse ERROR: at 1:1:1
unexpected: "1"
expected: "letter"
> (parse-result (many $letter) "abc123")
'(#\a #\b #\c)
> (parse-result (many $letter) "123")
'()

procedure
(parse p input) → (or/c Consumed? Empty?)
p : parser?
input : (or/c string? path? input-port?)

Parses input input with parser p and returns a Consumed or Empty struct that contains the result.

The input can be either a string, a filepath, or an input port. A string or path input is converted to an input port before parsing begins.

Raises the exn:fail:parsack exception on parser failure.

Examples:

> (parse $letter "abc")
(Consumed (Ok #\a))
> (parse $letter "123")
parse ERROR: at 1:1:1
unexpected: "1"
expected: "letter"
> (parse (many $letter) "abc123")
(Consumed (Ok '(#\a #\b #\c)))
> (parse (many $letter) "123")
(Empty (Ok '()))

3 Basic parsing forms🔗ℹ

procedure
(return x) → parser?
x : any/c

Creates a parser that consumes no input and always succeeds, returning x.

Example:

> (parse (return "b") "a")
(Empty (Ok "b"))

procedure
(>>= p f) → parser?
p : parser?
f : (-> any/c parser?)

Monadic bind operator for parsers. Creates a parser that first parses with p.

If p fails, the error result is returned.

Otherwise, the parse result is passed to f, and continues parsing with the parser created from applying f.

Example:

> (parse-result
   (>>= (char #$)
        (λ (skip1)
          (>>= $letter
               (λ (x)
                 (>>= (char #$)
                      (λ (skip2)
                        (return x)))))))
   "(a)")
#\a

procedure
(>> p q) → parser?
p : parser?
q : parser?

Equivalent to (>>= p (λ (x) q)) where x is not in q.

Creates a parser that first parses with p, and if successful, ignores the result, then parses with q.

Example:

> (parse-result
   (>> (char #$)
       (>>= $letter
            (λ (x)
              (>> (char #$)
                  (return x)))))
   "(a)")
#\a

syntax
(parser-compose bind-or-skip ...+)

bind-or-skip = (x <- parser)
| parser

parser = parser?

x = identifier?

Composes parsers. Syntactic wrapper for a chain of >>= operators.

Example:

> (parse-result
   (parser-compose (char #\[)
                   (x <- $letter)
                   (y <- $letter)
                   (char #\])
                   (return (list x y)))
   "[ab]")
'(#\a #\b)

syntax
(parser-seq skippable ... maybe-combine)

skippable = (~ parser)
| parser

parser = parser?

maybe-combine =
| #:combine-with combine

combine = (-> any/c any/c)

Combine parsers in "arrow" style instead of monadically. Applies all the given parsers, combines the results with the given combine function, and then returns it. The result of any parsers wrapped with ~ is not given to combine.

The default combine function is list so (parser-seq p q) is syntactically equivalent to (parser-compose (x <- p) (y <- q) (return (list x y))).

Use parser-seq instead of parser-compose when you don’t need the result of any parser other than to return it. Use parser-compose when you need the result of one parse to build subsequent parsers, for example when parsing matching html open-close tags, or if you need to do additional processing on the parse result before returning.

Example:

> (parse-result
   (parser-seq (~ (char #\[))
               $letter
               $letter
               (~ (char #\])))
   "[ab]")
'(#\a #\b)

syntax
(parser-cons p q)

p = parser?

q = parser?

Syntactically equivalent to (parser-seq p q #:combine-with cons)

Example:

> (parse-result
(parser-cons $letter (many $letter))
"abcde")
'(#\a #\b #\c #\d #\e)

syntax
(parser-one p ...)

p = (~> parser)
| parser

parser = parser?

Combines parsers but only return the result of the parser wrapped with ~>. Only one parser may be wrapped with ~>.

For example, (parser-one p1 (~> p2) p3) is syntactically equivalent to (parser-seq (~ p1) p2 (~ p3) #:combine-with (λ (x) x)), which is equivalent to (parser-compose p1 (x <- p2) p3 (return x)).

Example:

> (parse-result
(parser-one (char #$) (~> $letter) (char #$))
"(a)")
#\a

4 Basic Combinators🔗ℹ

procedure
(<or> p q ...) → parser?
p : parser?
q : parser?

Creates a parser that tries the given parsers in order, returning with the result of the first parser that consumes input. Errors if not given at least one parser.

Example:

> (parse-result (<or> $letter $digit) "1")
#\1

NOTES:

<or> continues to try subsequent parsers so long as each of the previous parsers consumes no input, even if one of the previous parsers returns successfully. Thus <or> implements "longest match" (see [parsec] for more details).
Example:
> (parse-result (<or> (return null) $digit) "1")
#\1
See also <any>, a related parser that immediately returns when it encounters a successful parse, even if the parse consumed no input.
Example:
> (parse-result (<any> (return null) $digit) "1")
'()
But if no parsers consume input, then <or> backtracks to return the result of the first success.
Example:
> (parse-result (<or> (return "a") (return "b") (return "c")) "1")
"a"
If one of the given parsers consumes input, and then errors, <or> returns immediately with the error.
Example:
> (parse-result
   (<or> (string "ab")
         (string "ac"))
   "ac")
parse ERROR: at 1:2:2
unexpected: "c"
  expected: "b"
Use try to reset the input on a partial parse.
Example:
> (parse-result
   (<or> (try (string "ab"))
         (string "ac"))
   "ac")
'(#\a #\c)

procedure
(choice ps) → parser?
ps : (listof parser?)

Same as (apply <or> ps).

procedure
(<any> p q ...) → parser?
p : parser?
q : parser?

Creates a parser that tries the given parsers in order, returning with the result of the first successful parse, even if no input is consumed. Errors if not given at least one parser. See <or> for a related, alternative parser.

Example:

> (parse-result (<any> $letter $digit) "1")
#\1

NOTES:

<any> immediately returns when it encounters a successful parse, even if the parse consumed no input.
Example:
> (parse-result (<any> (return null) $digit) "1")
'()
See also <or>, a related parser that continues to try subsequent parsers so long as each of the previous parsers consumes no input, even if one of the previous parsers returns successfully.
Example:
> (parse-result (<or> (return null) $digit) "1")
#\1

procedure
(many p [#:till end #:or orcomb]) → parser?
  p : parser?
  end : parser? = (return null)
  orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>

Creates a parser that repeatedly parses with p zero or more times.

Stops when given end parser parses "successfully" where "success" is determined by the given orcomb combinator. By default orcomb is <or> and end must consume input to be successful. If orcomb is <any>, then many terminates as soon as end returns non-error.

procedure
(many1 p) → parser?
p : parser?

Creates a parser that repeatedly parses with p one or more times.

procedure
(manyTill p end [#:or orcomb]) → parser?
  p : parser?
  end : parser?
  orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>

Creates a parser that repeatedly parses with p zero or more times, where parser end is tried after each p and the parsing ends when end succeeds.

Equivalent to (many p #:till end #:or orcomb).

procedure
(many1Till p end [#:or orcomb]) → parser?
  p : parser?
  end : parser?
  orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>

Creates a parser that repeatedly parses with p one or more times, where parser end is tried after each p and the parsing ends when end succeeds.

procedure
(manyUntil p end) → parser?
p : parser?
end : parser?

Creates a parser that repeatedly parses with p zero or more times, where parser end is tried after each p and the parsing ends when end returns non-error, even if end consumes no input.

Equivalent to (manyTill p end #:or <any>).

procedure
(many1Until p end) → parser?
p : parser?
end : parser?

Creates a parser that repeatedly parses with p one or more times, where parser end is tried after each p and the parsing ends when end returns non-error, even if end consumes no input.

Equivalent to (many1Till p end #:or <any>).

4.1 A Basic CSV parser🔗ℹ

Here is an implementation of a basic parser for comma-separated values (CSV).

A cell consists of any characters except a comma or newline.

> (define $oneCell (many (noneOf ",\n")))

A series of cells are separated by commas. To parse cells, we use two mutually referential parsers.

> (define $cells (parser-cons $oneCell $remainingCells))
> (define $remainingCells (<or> (>> (char #\,) $cells)
(return null)))

A line is a series of cells followed by a newline.

> (define $line (parser-one (~> $cells) $eol))

A CSV string is a series of lines.

> (define $csv (many $line))
> (parse-result $csv "cell1,cell2\ncell3,cell4\n")
'(((#\c #\e #\l #\l #\1) (#\c #\e #\l #\l #\2))
((#\c #\e #\l #\l #\3) (#\c #\e #\l #\l #\4)))

5 Other combinators🔗ℹ

procedure
(skipMany p) → parser?
p : parser?

Creates a parser that repeatedly parses with p zero or more times, but does not return the result.

procedure
(skipMany1 p) → parser?
p : parser?

Creates a parser that repeatedly parses with p one or more times, but does not return the result.

procedure
(sepBy p sep) → parser?
p : parser?
sep : parser?

Creates a parser that repeatedly parses with p zero or more times, where each parse of p is separated with a parse of sep. Only the results of p are returned.

procedure
(sepBy1 p sep) → parser?
p : parser?
sep : parser?

Creates a parser that repeatedly parses with p one or more times, where each parse of p is separated with a parse of sep. Only the results of p are returned.

procedure
(endBy p end) → parser?
p : parser?
end : parser?

Like sepBy except for an extra end at the end.

procedure
(between open close p) → parser?
  open : parser?
  close : parser?
  p : parser?

Creates a parser that parses with p only if it’s surround by open and close. Only the result of p is returned.

procedure
(lookAhead p) → parser?
p : parser?

Creates a parser that parses with p and returns its result, but consumes no input.

procedure
(<!> p [q]) → parser?
p : parser?
q : parser? = $anyChar

Similar to noneOf but more general. Creates a parser that errors if p successfully parses input, otherwise parses with q. q defaults to $anyChar if unspecified.

procedure
(notFollowedBy p) → parser?
p : parser?

Creates a parser that succeeds and return nothing if p fails, and fails if p succeeds. Does not consume input.

6 Character parsing🔗ℹ

procedure
(satisfy p?) → parser?
p? : (-> any/c boolean?)

Creates a parser that consumes and returns one character if it satisfies predicate p?.

procedure
(char c) → parser?
c : char?

Creates a parser that parses and returns char c, case-sensitive.

procedure
(charAnyCase c) → parser?
c : char?

Creates a parser that parses and returns char c, case-insensitive.

procedure
(noneOf str) → parser?
str : string?

Creates a parser that consumes and returns one character if the character does not appear in str.

procedure
(oneOf str) → parser?
str : string?

Creates a parser that consumes and returns one character if the character appears in str.

procedure
(oneOfStrings str ...) → parser?
str : string?

Creates a parser that consumes and returns any of the strs, case-sensitive. Note that the parse result is (listof char?) not string?.

procedure
(oneOfStringsAnyCase str ...) → parser?
str : string?

Creates a parser that consumes and returns any of the strs, case-insensitive. Note that the parse result is (listof char?) not string?.

procedure
(string str) → parser?
str : string?

Creates a parser that parses and returns str as a list of chars.

7 Constant parsers🔗ℹ

This library uses the $ prefix for identifiers that represent parsers (as opposed to combinators).

value
$letter : parser?

Parses an alphabetic char.

value
$digit : parser?

Parses an numeric char.

value
$alphaNum : parser?

Parses a letter or digit.

value
$hexDigit : parser?

Parses a hex char.

value
$space : parser?

Parses a space.

value
$spaces : parser?

Parses zero or more spaces.

value
$anyChar : parser?

Parses any char.

value
$newline : parser?

Parses newline char. This is the singular #\n character. See also $eol.

value
$tab : parser?

Parses tab char.

value
$eol : parser?

Parses end of line. $eol succeeds on "\n", "\r", "\r\n", or "\n\r". See also $newline.

value
$eof : parser?

Parses end of file.

value
$identifier : parser?

Parsers string containing only numbers, letters, or underscore.

8 User State🔗ℹ

procedure
(setState key val) → parser?
key : symbol?
val : any/c

Creates a parser that sets user state and whose result is the prior value of key (or #f is no value was set).

procedure
(getState key) → parser?
key : symbol?

Creates a parser whose result is the value for the state of key (or #f is there no value was set.)

syntax
(withState ([key value] ...) parser)

A form that creates a parser analog of parameterize, by using parser-compose and setState. The original value of each key is saved, the new value is set during the evaluation of parser, then the original value is restored.

9 Error handling combinators🔗ℹ

procedure
(try p) → parser?
p : parser?

Lookahead function. Creates a parser that tries to parse with p but does not consume input if p fails.

Examples:

> ((string "ab") (open-input-string "ac"))
(Consumed #<Error>)
> ((try (string "ab")) (open-input-string "ac"))
(Empty #<Error>)

procedure
(<?> p expected) → parser?
p : parser?
expected : string?

Creates a parser that tries to parser with p and returns error msg expected if it fails.

value
$err : parser?

Parser that always returns error.

procedure
(err expected) → parser?
expected : string?

Like $err but allows custom expected msg.

10 Bytestring parsing🔗ℹ

procedure
(byte b) → parser?
b : byte?

Creates a parser that parses and returns byte b.

procedure
(bytestring bstr) → parser?
bstr : bytes?

Creates a parser that parses and returns bstr as a list of bytes.

11 Parse Result Structs🔗ℹ

A parser is a function that consumes an input-port? and returns either a Consumed, or an Empty struct.

In general, users should use the above combinators to connect parsers and parse results, rather than manipulate these structs directly.

struct
(struct Consumed (reply))
reply : (or/c Ok? Error?)

This is the result of parsing if some input is consumed.

Examples:

> (parse $letter "abc")
(Consumed (Ok #\a))
> (parse (many $letter) "abc123")
(Consumed (Ok '(#\a #\b #\c)))
> ((string "ab") (open-input-string "ac"))
(Consumed #<Error>)

struct
(struct Empty (reply))
reply : (or/c Ok? Error?)

This is the result of parsing if no input is consumed.

Example:

> (parse (many $letter) "123")
(Empty (Ok '()))

struct
(struct Ok (parsed))
parsed : any/c

Contains the result of parsing parser.

The parse result can be any value and depends on the specific parser that produces the this struct.

Examples:

> (parse $letter "abc")
(Consumed (Ok #\a))
> (parse (many $letter) "123")
(Empty (Ok '()))

struct
(struct Error ())

Indicates parse error.

Example:

> ($letter (open-input-string "123"))
(Empty #<Error>)

struct
(struct exn:fail:parsack exn:fail ())

parse-result and parse throw this exception on parse failure.

Bibliography🔗ℹ

[parsec]

Daan Leijen and Erik Meijer, “Parsec: A practical parser library,” Electronic Notes in Theoretical Computer Science, 2001.

1	Parsers
2	Basic parsing functions
3	Basic parsing forms
4	Basic Combinators
5	Other combinators
6	Character parsing
7	Constant parsers
8	User State
9	Error handling combinators
10	Bytestring parsing
11	Parse Result Structs
	Bibliography