5 API Reference
A parser is a value that represents a method of turning a syntax object or sequence of syntax objects an arbitrary Racket value. Parsers can be created using various primitives, then sequenced together using parser combinators to create larger parsers.
Parsers are functors, applicative functors, and monads, which allows them to be mapped over and sequenced together using the corresponding generic interfaces.
5.1 Primitives
If both in-ctc and out-ctc are chaperone contracts, then the result will also be a chaperone contract.
procedure
parser : parser? boxes : (listof syntax-box?)
procedure
(parse-error->string message) → string?
message : message?
procedure
(parse-result! result) → any/c
result : (either/c message? any/c)
struct
(struct exn:fail:read:megaparsack exn:fail:read ( unexpected expected) #:transparent) unexpected : any/c expected : (listof string?)
struct
(struct syntax-box (datum srcloc) #:transparent) datum : any/c srcloc : srcloc?
The datum can be anything at all, but usually it is either a character or some token produced as the result of lexing. It is unlikely that you will need to create syntax-box values yourself; rather, use higher-level functions like parse-string that create these values for you.
struct
(struct message (srcloc unexpected expected) #:transparent) srcloc : srcloc? unexpected : any/c expected : (listof string?)
Changed in version 1.5 of package megaparsack-lib: Changed to always return the first successful result, rather than continuing to try parsers until one consumes input. The new behavior is more predictable and more consistent with existing Parsec implementations, though the old behavior was more consistent with the presentation in the original paper.
procedure
(noncommittal/p parser) → parser?
parser : parser?
Note that unlike lookahead/p, noncommittal/p only affects backtracking; the consumed input is still removed from the input stream. If true lookahead is desired, use lookahead/p, instead.
Creates a new parser like parser, except that it is not considered to have consumed input for the purposes of backtracking if parser succeeds. This allows a future failure to backtrack to an earlier choice point (assuming the failure does not itself consume input).
> (parse-result! (parse-string (or/p (do (char/p #\a) (char/p #\b)) (do (char/p #\a) (char/p #\c))) "ac")) string:1:1: parse error
unexpected: c
expected: 'b'
> (parse-result! (parse-string (or/p (do (noncommittal/p (char/p #\a)) (char/p #\b)) (do (char/p #\a) (char/p #\c))) "ac")) #\c
(do (noncommittal/p parser-a) parser-b).
Note that noncommittal/p does not affect whether parser is considered to have consumed input if it fails, which is to say that try/p and noncommittal/p are orthogonal and should be combined if both behaviors are desired.
Added in version 1.7 of package megaparsack-lib.
procedure
(lookahead/p parser) → parser?
parser : parser?
For example, lookahead/p can be used to implement a parser that only succeeds at the end of a line, but does not consume the newline character itself:
> (define end-of-line/p (lookahead/p (char/p #\newline)))
This can be used to parse, for example, line comments that span to the end of the current line, while still allowing a later parser to consume the newline character:
> (define rest-of-line/p (or/p (do end-of-line/p (pure "")) (do [c <- any-char/p] [cs <- rest-of-line/p] (pure (string-append (string c) cs)))))
> (define line-comment/p (do (try/p (string/p "# ")) rest-of-line/p))
> (parse-string (many/p (do [line <- line-comment/p] (char/p #\newline) (pure line))) (string-append "# hello\n" "# world\n")) (success '("hello" "world"))
Note that if parser fails, lookahead/p has no effect; if it consumed input before failing, it will not try other alternatives in an enclosing or/p. Wrap parser with try/p if this behavior is undesirable.
Added in version 1.5 of package megaparsack-lib.
The syntax/p combinator makes source location wrapping opt-in, which is desirable since it is often useful to return values from combinators that are intermediate values not intended to be wrapped in syntax (for example, many/p returns a list of results, not a syntax list).
procedure
(syntax-box/p parser) → (parser/c any/c syntax-box?)
parser : parser?
> (define delayed/p (delay/p (begin (println 'evaluated) (char/p #\a)))) > (parse-result! (parse-string delayed/p "a")) 'evaluated
#\a
> (parse-result! (parse-string delayed/p "a")) #\a
delay/p can be used to delay evaluation in situations where a (possibly mutually) recursive parser would otherwise require evaluating a parser before its definition. For example:
> (define one/p (or/p (char/p #\.) (delay/p two/p))) > (define two/p (list/p (char/p #\a) one/p)) > (parse-result! (parse-string one/p "aa.")) '(#\a (#\a #\.))
Without the use of delay/p, the reference to two/p would be evaluated too soon (since or/p is an ordinary function, unlike or).
Note that the delay/p expression itself may be evaluated multiple times, in which case the parser-expr may be as well (since each evaluation of delay/p creates a separate parser). This can easily arise from uses of do, since do is syntactic sugar for nested uses of lambda, though it might not be syntactically obvious that the delay/p expression appears under one such lambda. For example:
> (define sneaky-evaluation/p (do (char/p #\a) (delay/p (begin (println 'evaluated) (char/p #\b))))) > (parse-result! (parse-string sneaky-evaluation/p "ab")) 'evaluated
#\b
> (parse-result! (parse-string sneaky-evaluation/p "ab")) 'evaluated
#\b
In other words, delay/p doesn’t perform any magical memoization or caching, so it can’t be used to prevent a parser from being evaluated multiple times, only to delay its evaluation to a later point in time.
Added in version 1.6 of package megaparsack-lib.
> (parse-result! (parse-string (one-of/p '(#\a #\b)) "a")) #\a
> (parse-result! (parse-string (one-of/p '(#\a #\b)) "b")) #\b
> (parse-result! (parse-string (one-of/p '(#\a #\b)) "c")) string:1:0: parse error
unexpected: c
expected: a or b
Added in version 1.2 of package megaparsack-lib.
procedure
(guard/p parser pred? [ expected make-unexpected]) → parser? parser : parser? pred? : (any/c . -> . any/c) expected : (or/c string? #f) = #f make-unexpected : (any/c . -> . any/c) = identity
If the parser fails and expected is a string, then expected is used to add expected information to the parser error. Additionally, the make-unexpected function is applied to the result of parser to produce the unexpected field of the parse error.
> (define small-integer/p (guard/p integer/p (λ (x) (<= x 100)) "integer in range [0,100]")) > (parse-result! (parse-string small-integer/p "42")) 42
> (parse-result! (parse-string small-integer/p "300")) string:1:0: parse error
unexpected: 300
expected: integer in range [0,100]
> (define dotted-let-digit-let/p (list/p letter/p digit/p letter/p #:sep (char/p #\.))) > (parse-result! (parse-string dotted-let-digit-let/p "a.1.b")) '(#\a #\1 #\b)
> (parse-result! (parse-string dotted-let-digit-let/p "a1c")) string:1:1: parse error
unexpected: 1
expected: '.'
> (parse-result! (parse-string dotted-let-digit-let/p "a.1")) string:1:2: parse error
unexpected: end of input
expected: '.'
Using a separator parser that consumes no input (such as the default separator, void/p) is equivalent to not using a separator at all.
> (define let-digit-let/p (list/p letter/p digit/p letter/p)) > (parse-result! (parse-string let-digit-let/p "a1b")) '(#\a #\1 #\b)
5.1.1 Repetition
procedure
(many/p parser [ #:sep sep #:min min-count #:max max-count]) → (parser/c any/c list?) parser : parser? sep : parser? = void/p min-count : exact-nonnegative-integer? = 0 max-count : (or/c exact-nonnegative-integer? +inf.0) = +inf.0
> (define letters/p (many/p letter/p)) > (parse-result! (parse-string letters/p "abc")) '(#\a #\b #\c)
> (define dotted-letters/p (many/p letter/p #:sep (char/p #\.) #:min 2 #:max 4)) > (parse-result! (parse-string dotted-letters/p "a.b.c")) '(#\a #\b #\c)
> (parse-result! (parse-string dotted-letters/p "abc")) string:1:1: parse error
unexpected: b
expected: '.'
> (parse-result! (parse-string dotted-letters/p "a")) string:1:0: parse error
unexpected: end of input
expected: '.'
> (parse-result! (parse-string dotted-letters/p "a.b.c.d.e")) '(#\a #\b #\c #\d)
Added in version 1.1 of package megaparsack-lib.
procedure
(many+/p parser [#:sep sep #:max max-count])
→ (parser/c any/c list?) parser : parser? sep : parser? = void/p max-count : (or/c exact-nonnegative-integer? +inf.0) = +inf.0
Changed in version 1.1 of package megaparsack-lib: Added support for #:sep and #:max keyword arguments for consistency with many/p.
procedure
n : exact-nonnegative-integer? parser : parser?
procedure
(many-until/p parser #:end end [ #:sep sep #:min min-count]) → (parser/c (list/c list? any/c)) parser : parser? end : parser? sep : parser? = void/p min-count : exact-nonnegative-integer? = 0
> (define letters-then-punctuation (many-until/p letter/p #:end (char-in/p ".!?,;"))) > (parse-result! (parse-string letters-then-punctuation "abc!")) '((#\a #\b #\c) #\!)
> (parse-result! (parse-string letters-then-punctuation "abc,efg")) '((#\a #\b #\c) #\,)
> (parse-result! (parse-string letters-then-punctuation "a1c;")) string:1:1: parse error
unexpected: 1
expected: '!', ',', '.', ';', '?', or letter
> (parse-result! (parse-string letters-then-punctuation "?")) '(() #\?)
To determine if the repetition should stop, end is attempted before each optional attempt of parser, i.e. each attempt after the ones required by min-count (if any). If end succeeds, the repetition is terminated immediately, regardless of whether or not further attempts of parser might succeed:
> (define digits-then-zero (many-until/p digit/p #:end (char/p #\0))) > (parse-result! (parse-string digits-then-zero "1230")) '((#\1 #\2 #\3) #\0)
> (parse-result! (parse-string digits-then-zero "12305670")) '((#\1 #\2 #\3) #\0)
If end fails without consuming input, the repetition continues. However, note that if an attempt of end fails after consuming input, the failure is propagated to the entire repetition, as with or/p. This allows a partial success of end to “commit” to ending the repetition, even if further attempts parser would succeed:
> (define telegram/p (many-until/p any-char/p #:end (string/p "STOP"))) > (parse-result! (parse-string telegram/p "HELLO STOP")) '((#\H #\E #\L #\L #\O #\space) "STOP")
> (parse-result! (parse-string telegram/p "MESSAGE STOP")) string:1:3: parse error
unexpected: S
expected: 'T'
This behavior can be useful in situations where end is complex (so it’s helpful to report a parse error that occurs after some prefix of it has succeeded), but in cases like the above, it is usually not desired. As with or/p, this committing behavior can be suppressed by wrapping end with try/p:
> (define fixed-telegram/p (many-until/p any-char/p #:end (try/p (string/p "STOP")))) > (parse-result! (parse-string fixed-telegram/p "MESSAGE STOP")) '((#\M #\E #\S #\S #\A #\G #\E #\space) "STOP")
Added in version 1.7 of package megaparsack-lib.
procedure
(many+-until/p parser #:end end [#:sep sep])
→ (parser/c (list/c list? any/c)) parser : parser? end : parser? sep : parser? = void/p
Added in version 1.7 of package megaparsack-lib.
5.1.2 Parser Parameters
procedure
v : any/c
Like ordinary parameters, parser parameters are procedures that accept zero or one argument. Unlike ordinary parameters, the result of applying a parser parameter procedure is a parser, which must be monadically sequenced with other parsers to have any effect.
> (define param (make-parser-parameter #f)) > (parse-result! (parse-string (param) "")) #f
> (parse-result! (parse-string (do (param #t) (param)) "")) #t
Each call to parse is executed with a distinct parser parameterization, which means modifications to parser parameters are only visible during that particular parser execution. The v argument passed to make-parser-parameter is used as the created parser parameter’s initial value in each distinct parser parameterization.
Parser parameters are useful for tracking state needed by context-sensitive parsers, but they can also be used to provide values with dynamic extent using parameterize/p, just as ordinary parameters can be locally modified via parameterize.
Added in version 1.4 of package megaparsack-lib.
procedure
(parser-parameter? v) → boolean?
v : any/c
Added in version 1.4 of package megaparsack-lib.
syntax
(parameterize/p ([param-expr val-expr] ...) parser-expr)
param-expr : parser-parameter?
parser-expr : parser?
> (define param (make-parser-parameter #f))
> (parse-result! (parse-string (do [a <- (param)] [b <- (parameterize/p ([param #t]) (param))] [c <- (param)] (pure (list a b c))) "")) '(#f #t #f)
If any of the param-expr’s values are modified by parser-expr via a direct call to the parser parameter procedure, the value remains modified until control leaves the enclosing parameterize/p parser, after which the value is restored. (This behavior is precisely analogous to modifying an ordinary parameter within the body of a parameterize expression.)
> (define param (make-parser-parameter #f))
> (parse-result! (parse-string (do (param 1) [a <- (parameterize/p ([param 2]) (do (param 3) (param)))] [b <- (param)] (pure (list a b))) "")) '(3 1)
Added in version 1.4 of package megaparsack-lib.
5.2 Parsing Text
(require megaparsack/text) | package: megaparsack-lib |
procedure
(parse-string parser str [src-name]) → (either/c message? any/c)
parser : (parser/c char? any/c) str : string? src-name : any/c = 'string
procedure
(parse-syntax-string parser stx-str) → (either/c message? any/c)
parser : (parser/c char? any/c) stx-str : (syntax/c string?)
procedure
(char-not/p c) → (parser/c char? char?)
c : char?
Added in version 1.3 of package megaparsack-lib.
> (parse-result! (parse-string (char-between/p #\a #\z) "d")) #\d
> (parse-result! (parse-string (char-between/p #\a #\z) "D")) string:1:0: parse error
unexpected: D
expected: a character between 'a' and 'z'
Added in version 1.2 of package megaparsack-lib.
> (parse-result! (parse-string (char-in/p "aeiou") "i")) #\i
> (parse-result! (parse-string (char-in/p "aeiou") "z")) string:1:0: parse error
unexpected: z
expected: 'a', 'e', 'i', 'o', or 'u'
Added in version 1.2 of package megaparsack-lib.
procedure
(char-not-in/p alphabet) → (parser/c char? char?)
alphabet : string?
Added in version 1.3 of package megaparsack-lib.
value
any-char/p : (parser/c char? char?)
Added in version 1.3 of package megaparsack-lib.
value
symbolic/p : (parser/c char? char?)
procedure
(string-ci/p str) → (parser/c char? string?)
str : string?
> (parse-result! (parse-string (string-ci/p "hello") "HeLlO")) "HeLlO"
Added in version 1.3 of package megaparsack-lib.
Changed in version 1.8: Changed to return the parsed input string rather
than always returning str.
5.3 Parsing with parser-tools/lex
(require megaparsack/parser-tools/lex) | |
package: megaparsack-parser-tools |
Sometimes it is useful to run a lexing pass over an input stream before parsing, in which case megaparsack/text is not appropriate. The parser-tools package provides the parser-tools/lex library, which implements a lexer that produces tokens.
When using parser-tools/lex, use lexer-src-pos instead of lexer to enable the built-in source location tracking. This will produce a sequence of position-token elements, which can then be passed to parse-tokens and detected with token/p.
procedure
(parse-tokens parser tokens [source-name]) → syntax?
parser : parser? tokens : (listof position-token?) source-name : any/c = 'tokens
5.4 Deprecated Forms and Functions
NOTE: This function is deprecated; use many/p, instead.
procedure
(many/sep*/p parser sep) → parser?
parser : parser? sep : parser?
NOTE: This function is deprecated; use (many/p parser #:sep sep), instead.
procedure
(many/sep+/p parser sep) → parser?
parser : parser? sep : parser?
NOTE: This function is deprecated; use (many+/p parser #:sep sep), instead.