Parsec implementation in Racket
1 Parsers
2 Basic parsing functions
parse-result
parse
3 Basic parsing forms
return
>>=
>>
parser-compose
parser-seq
parser-cons
parser-one
4 Basic Combinators
<or>
choice
<any>
many
many1
many  Till
many1Till
many  Until
many1Until
4.1 A Basic CSV parser
5 Other combinators
skip  Many
skip  Many1
sep  By
sep  By1
end  By
between
look  Ahead
<!>
not  Followed  By
6 Character parsing
satisfy
char
char  Any  Case
none  Of
one  Of
one  Of  Strings
one  Of  Strings  Any  Case
string
7 Constant parsers
$letter
$digit
$alpha  Num
$hex  Digit
$space
$spaces
$any  Char
$newline
$tab
$eol
$eof
$identifier
8 User State
set  State
get  State
with  State
9 Error handling combinators
try
<?>
$err
err
10 Bytestring parsing
byte
bytestring
11 Parse Result Structs
Consumed
Empty
Ok
Error
exn:  fail:  parsack
Bibliography
7.1

Parsec implementation in Racket

Stephen Chang <[email protected]>

 (require parsack) package: parsack

Parsec implementation in Racket. See [parsec].

1 Parsers

A parser is a function that consumes an input-port? and returns a parse result. A parse result is most commonly a char? or a list of chars, but is ultimately determined by each specific parser.

Two factors are considered when combining parsers:
  • whether a parser consumes input, i.e., whether data was read from the input port,

  • or whether a parser succeeds or fails.

These two factors are not mutually exclusive and some combinators only use one of the factors while others use both to determine the parse result.

Specifically, Consumed and Empty struct results indicate input-consumption and no-input-consumption, respectively, while the Ok and Error structs indicate success and failure, respectively. See Parse Result Structs for more details about information contained in the parse result. In general, users should use combinators below to compose parsers and parse results, rather than directly handle the structs.

2 Basic parsing functions

procedure

(parse-result p input)  any/c

  p : parser?
  input : (or/c string? path? input-port?)
Parses input input with parser p and returns the parse result.

The input can be either a string, a filepath, or an input port. A string or path input is converted to an input port before parsing begins.

Raises the exn:fail:parsack exception on parser failure.

Examples:
> (parse-result $letter "abc")

#\a

> (parse-result $letter "123")

parse ERROR: at 1:1:1

unexpected: "1"

  expected: "letter"

> (parse-result (many $letter) "abc123")

'(#\a #\b #\c)

> (parse-result (many $letter) "123")

'()

procedure

(parse p input)  (or/c Consumed? Empty?)

  p : parser?
  input : (or/c string? path? input-port?)
Parses input input with parser p and returns a Consumed or Empty struct that contains the result.

The input can be either a string, a filepath, or an input port. A string or path input is converted to an input port before parsing begins.

Raises the exn:fail:parsack exception on parser failure.

Examples:
> (parse $letter "abc")

(Consumed (Ok #\a))

> (parse $letter "123")

parse ERROR: at 1:1:1

unexpected: "1"

  expected: "letter"

> (parse (many $letter) "abc123")

(Consumed (Ok '(#\a #\b #\c)))

> (parse (many $letter) "123")

(Empty (Ok '()))

3 Basic parsing forms

procedure

(return x)  parser?

  x : any/c
Creates a parser that consumes no input and always succeeds, returning x.

Example:
> (parse (return "b") "a")

(Empty (Ok "b"))

procedure

(>>= p f)  parser?

  p : parser?
  f : (-> any/c parser?)
Monadic bind operator for parsers. Creates a parser that first parses with p.

If p fails, the error result is returned.

Otherwise, the parse result is passed to f, and continues parsing with the parser created from applying f.

Example:
> (parse-result
   (>>= (char #\()
        (λ (skip1)
          (>>= $letter
               (λ (x)
                 (>>= (char #\))
                      (λ (skip2)
                        (return x)))))))
   "(a)")

#\a

procedure

(>> p q)  parser?

  p : parser?
  q : parser?
Equivalent to (>>= p (λ (x) q)) where x is not in q.

Creates a parser that first parses with p, and if successful, ignores the result, then parses with q.

Example:
> (parse-result
   (>> (char #\()
       (>>= $letter
            (λ (x)
              (>> (char #\))
                  (return x)))))
   "(a)")

#\a

syntax

(parser-compose bind-or-skip ...+)

 
bind-or-skip = (x <- parser)
  | parser
     
parser = parser?
     
x = identifier?
Composes parsers. Syntactic wrapper for a chain of >>= operators.

Example:
> (parse-result
   (parser-compose (char #\[)
                   (x <- $letter)
                   (y <- $letter)
                   (char #\])
                   (return (list x y)))
   "[ab]")

'(#\a #\b)

syntax

(parser-seq skippable ... maybe-combine)

 
skippable = (~ parser)
  | parser
     
parser = parser?
     
maybe-combine = 
  | #:combine-with combine
     
combine = (-> any/c any/c)
Combine parsers in "arrow" style instead of monadically. Applies all the given parsers, combines the results with the given combine function, and then returns it. The result of any parsers wrapped with ~ is not given to combine.

The default combine function is list so (parser-seq p q) is syntactically equivalent to (parser-compose (x <- p) (y <- q) (return (list x y))).

Use parser-seq instead of parser-compose when you don’t need the result of any parser other than to return it. Use parser-compose when you need the result of one parse to build subsequent parsers, for example when parsing matching html open-close tags, or if you need to do additional processing on the parse result before returning.

Example:
> (parse-result
   (parser-seq (~ (char #\[))
               $letter
               $letter
               (~ (char #\])))
   "[ab]")

'(#\a #\b)

syntax

(parser-cons p q)

 
p = parser?
     
q = parser?
Syntactically equivalent to (parser-seq p q #:combine-with cons)

Example:
> (parse-result
   (parser-cons $letter (many $letter))
   "abcde")

'(#\a #\b #\c #\d #\e)

syntax

(parser-one p ...)

 
p = (~> parser)
  | parser
     
parser = parser?
Combines parsers but only return the result of the parser wrapped with ~>. Only one parser may be wrapped with ~>.

For example, (parser-one p1 (~> p2) p3) is syntactically equivalent to (parser-seq (~ p1) p2 (~ p3) #:combine-with (λ (x) x)), which is equivalent to (parser-compose p1 (x <- p2) p3 (return x)).

Example:
> (parse-result
   (parser-one (char #\() (~> $letter) (char #\)))
   "(a)")

#\a

4 Basic Combinators

procedure

(<or> p q ...)  parser?

  p : parser?
  q : parser?
Creates a parser that tries the given parsers in order, returning with the result of the first parser that consumes input. Errors if not given at least one parser.

Example:
> (parse-result (<or> $letter $digit) "1")

#\1

NOTES:
  • <or> continues to try subsequent parsers so long as each of the previous parsers consumes no input, even if one of the previous parsers returns successfully. Thus <or> implements "longest match" (see [parsec] for more details).

    Example:
    > (parse-result (<or> (return null) $digit) "1")

    #\1

  • See also <any>, a related parser that immediately returns when it encounters a successful parse, even if the parse consumed no input.

    Example:
    > (parse-result (<any> (return null) $digit) "1")

    '()

  • But if no parsers consume input, then <or> backtracks to return the result of the first success.

    Example:
    > (parse-result (<or> (return "a") (return "b") (return "c")) "1")

    "a"

  • If one of the given parsers consumes input, and then errors, <or> returns immediately with the error.

    Example:
    > (parse-result
       (<or> (string "ab")
             (string "ac"))
       "ac")

    parse ERROR: at 1:2:2

    unexpected: "c"

      expected: "b"

    Use try to reset the input on a partial parse.

    Example:
    > (parse-result
       (<or> (try (string "ab"))
             (string "ac"))
       "ac")

    '(#\a #\c)

procedure

(choice ps)  parser?

  ps : (listof parser?)
Same as (apply <or> ps).

procedure

(<any> p q ...)  parser?

  p : parser?
  q : parser?
Creates a parser that tries the given parsers in order, returning with the result of the first successful parse, even if no input is consumed. Errors if not given at least one parser. See <or> for a related, alternative parser.

Example:
> (parse-result (<any> $letter $digit) "1")

#\1

NOTES:
  • <any> immediately returns when it encounters a successful parse, even if the parse consumed no input.

    Example:
    > (parse-result (<any> (return null) $digit) "1")

    '()

  • See also <or>, a related parser that continues to try subsequent parsers so long as each of the previous parsers consumes no input, even if one of the previous parsers returns successfully.

    Example:
    > (parse-result (<or> (return null) $digit) "1")

    #\1

procedure

(many p [#:till end #:or orcomb])  parser?

  p : parser?
  end : parser? = (return null)
  orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>
Creates a parser that repeatedly parses with p zero or more times.

Stops when given end parser parses "successfully" where "success" is determined by the given orcomb combinator. By default orcomb is <or> and end must consume input to be successful. If orcomb is <any>, then many terminates as soon as end returns non-error.

procedure

(many1 p)  parser?

  p : parser?
Creates a parser that repeatedly parses with p one or more times.

procedure

(manyTill p end [#:or orcomb])  parser?

  p : parser?
  end : parser?
  orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>
Creates a parser that repeatedly parses with p zero or more times, where parser end is tried after each p and the parsing ends when end succeeds.

Equivalent to (many p #:till end #:or orcomb).

procedure

(many1Till p end [#:or orcomb])  parser?

  p : parser?
  end : parser?
  orcomb : (-> parser? ... (or/c Consumed? Empty?)) = <or>
Creates a parser that repeatedly parses with p one or more times, where parser end is tried after each p and the parsing ends when end succeeds.

procedure

(manyUntil p end)  parser?

  p : parser?
  end : parser?
Creates a parser that repeatedly parses with p zero or more times, where parser end is tried after each p and the parsing ends when end returns non-error, even if end consumes no input.

Equivalent to (manyTill p end #:or <any>).

procedure

(many1Until p end)  parser?

  p : parser?
  end : parser?
Creates a parser that repeatedly parses with p one or more times, where parser end is tried after each p and the parsing ends when end returns non-error, even if end consumes no input.

Equivalent to (many1Till p end #:or <any>).

4.1 A Basic CSV parser

Here is an implementation of a basic parser for comma-separated values (CSV).

A cell consists of any characters except a comma or newline.
> (define $oneCell (many (noneOf ",\n")))

A series of cells are separated by commas. To parse cells, we use two mutually referential parsers.

> (define $cells (parser-cons $oneCell $remainingCells))
> (define $remainingCells (<or> (>> (char #\,) $cells)
                                (return null)))

A line is a series of cells followed by a newline.
> (define $line (parser-one (~> $cells) $eol))

A CSV string is a series of lines.
> (define $csv (many $line))
> (parse-result $csv "cell1,cell2\ncell3,cell4\n")

'(((#\c #\e #\l #\l #\1) (#\c #\e #\l #\l #\2))

  ((#\c #\e #\l #\l #\3) (#\c #\e #\l #\l #\4)))

5 Other combinators

procedure

(skipMany p)  parser?

  p : parser?
Creates a parser that repeatedly parses with p zero or more times, but does not return the result.

procedure

(skipMany1 p)  parser?

  p : parser?
Creates a parser that repeatedly parses with p one or more times, but does not return the result.

procedure

(sepBy p sep)  parser?

  p : parser?
  sep : parser?
Creates a parser that repeatedly parses with p zero or more times, where each parse of p is separated with a parse of sep. Only the results of p are returned.

procedure

(sepBy1 p sep)  parser?

  p : parser?
  sep : parser?
Creates a parser that repeatedly parses with p one or more times, where each parse of p is separated with a parse of sep. Only the results of p are returned.

procedure

(endBy p end)  parser?

  p : parser?
  end : parser?
Like sepBy except for an extra end at the end.

procedure

(between open close p)  parser?

  open : parser?
  close : parser?
  p : parser?
Creates a parser that parses with p only if it’s surround by open and close. Only the result of p is returned.

procedure

(lookAhead p)  parser?

  p : parser?
Creates a parser that parses with p and returns its result, but consumes no input.

procedure

(<!> p [q])  parser?

  p : parser?
  q : parser? = $anyChar
Similar to noneOf but more general. Creates a parser that errors if p successfully parses input, otherwise parses with q. q defaults to $anyChar if unspecified.

procedure

(notFollowedBy p)  parser?

  p : parser?
Creates a parser that succeeds and return nothing if p fails, and fails if p succeeds. Does not consume input.

6 Character parsing

procedure

(satisfy p?)  parser?

  p? : (-> any/c boolean?)
Creates a parser that consumes and returns one character if it satisfies predicate p?.

procedure

(char c)  parser?

  c : char?
Creates a parser that parses and returns char c, case-sensitive.

procedure

(charAnyCase c)  parser?

  c : char?
Creates a parser that parses and returns char c, case-insensitive.

procedure

(noneOf str)  parser?

  str : string?
Creates a parser that consumes and returns one character if the character does not appear in str.

procedure

(oneOf str)  parser?

  str : string?
Creates a parser that consumes and returns one character if the character appears in str.

procedure

(oneOfStrings str ...)  parser?

  str : string?
Creates a parser that consumes and returns any of the strs, case-sensitive. Note that the parse result is (listof char?) not string?.

procedure

(oneOfStringsAnyCase str ...)  parser?

  str : string?
Creates a parser that consumes and returns any of the strs, case-insensitive. Note that the parse result is (listof char?) not string?.

procedure

(string str)  parser?

  str : string?
Creates a parser that parses and returns str as a list of chars.

7 Constant parsers

This library uses the $ prefix for identifiers that represent parsers (as opposed to combinators).

value

$letter : parser?

Parses an alphabetic char.

value

$digit : parser?

Parses an numeric char.

value

$alphaNum : parser?

Parses a letter or digit.

value

$hexDigit : parser?

Parses a hex char.

value

$space : parser?

Parses a space.

value

$spaces : parser?

Parses zero or more spaces.

value

$anyChar : parser?

Parses any char.

value

$newline : parser?

Parses newline char. This is the singular #\n character. See also $eol.

value

$tab : parser?

Parses tab char.

value

$eol : parser?

Parses end of line. $eol succeeds on "\n", "\r", "\r\n", or "\n\r". See also $newline.

value

$eof : parser?

Parses end of file.

value

$identifier : parser?

Parsers string containing only numbers, letters, or underscore.

8 User State

procedure

(setState key val)  parser?

  key : symbol?
  val : any/c
Creates a parser that sets user state and whose result is the prior value of key (or #f is no value was set).

procedure

(getState key)  parser?

  key : symbol?
Creates a parser whose result is the value for the state of key (or #f is there no value was set.)

syntax

(withState ([key value] ...) parser)

A form that creates a parser analog of parameterize, by using parser-compose and setState. The original value of each key is saved, the new value is set during the evaluation of parser, then the original value is restored.

9 Error handling combinators

procedure

(try p)  parser?

  p : parser?
Lookahead function. Creates a parser that tries to parse with p but does not consume input if p fails.

Examples:
> ((string "ab") (open-input-string "ac"))

(Consumed #<Error>)

> ((try (string "ab")) (open-input-string "ac"))

(Empty #<Error>)

procedure

(<?> p expected)  parser?

  p : parser?
  expected : string?
Creates a parser that tries to parser with p and returns error msg expected if it fails.

value

$err : parser?

Parser that always returns error.

procedure

(err expected)  parser?

  expected : string?
Like $err but allows custom expected msg.

10 Bytestring parsing

procedure

(byte b)  parser?

  b : byte?
Creates a parser that parses and returns byte b.

procedure

(bytestring bstr)  parser?

  bstr : bytes?
Creates a parser that parses and returns bstr as a list of bytes.

11 Parse Result Structs

A parser is a function that consumes an input-port? and returns either a Consumed, or an Empty struct.

In general, users should use the above combinators to connect parsers and parse results, rather than manipulate these structs directly.

struct

(struct Consumed (reply))

  reply : (or/c Ok? Error?)
This is the result of parsing if some input is consumed.

Examples:
> (parse $letter "abc")

(Consumed (Ok #\a))

> (parse (many $letter) "abc123")

(Consumed (Ok '(#\a #\b #\c)))

> ((string "ab") (open-input-string "ac"))

(Consumed #<Error>)

struct

(struct Empty (reply))

  reply : (or/c Ok? Error?)
This is the result of parsing if no input is consumed.

Example:
> (parse (many $letter) "123")

(Empty (Ok '()))

struct

(struct Ok (parsed))

  parsed : any/c
Contains the result of parsing parser.

The parse result can be any value and depends on the specific parser that produces the this struct.

Examples:
> (parse $letter "abc")

(Consumed (Ok #\a))

> (parse (many $letter) "123")

(Empty (Ok '()))

struct

(struct Error ())

Indicates parse error.

Example:
> ($letter (open-input-string "123"))

(Empty #<Error>)

parse-result and parse throw this exception on parse failure.

Bibliography

[parsec] Daan Leijen and Erik Meijer, “Parsec: A practical parser library,” Electronic Notes in Theoretical Computer Science, 2001.