Racket

Combinator Parser🔗ℹ

Note: This package is provided for historical reasons. The code was previously part of the Racket distribution but was removed for lack of a maintainer. We recommend using either parser-tools/yacc or other combinator libraries such as Parsack.

This documentation provides directions on using the combinator parser library. It assumes familiarity with lexing and with combinator parsers. The library was originally developed by Kathy Gray.

(require combinator-parser/combinator-unit)
	package: combinator-parser

This library provides a unit implementing four higher-order functions that can be used to build a combinator parser, and the export and import signatures related to it. The functions contained in this unit automatically build error reporting mechanisms in the event that no parse is found. Unlike other combinator parsers, this system assumes that the input is already lexed into tokens using parser-tools/lex. This library relies on lazy.

value
combinator-parser-tools@ : unit?

This unit exports the signature combinator-parser^ and imports the signatures error-format-parameters^, language-format-parameters^, and language-dictionary^.

signature
combinator-parser^ : signature

This signature references functions to build combinators, a function to build a runable parser using a combinator, a structure for recording errors and macro definitions to specify combinators with:
procedure
(terminal predicate
result
name
[ spell-check
case-check
type-check]) → (-> (list/c token?) parser-result)
  predicate : (-> token? boolean?)
  result : (-> token? beta)
  name : string?
  spell-check : (or/c #f (-> token? boolean?)) = #f
  case-check : (or/c #f (-> token? boolean?)) = #f
  type-check : (or/c #f (-> token? boolean?)) = #f
The returned function accepts one terminal from a token stream, and returns produces an opaque value that interacts with other combinators.
procedure
(seq sequence result name) → (-> (list token) parser-result)
  sequence : (listof (-> (list token) parser-result))
  result : (-> (list alpha) -> beta)
  name : string?
The returned function accepts a term made up of a sequence of smaller terms, and produces an opaque value that interacts with other combinators.
The sequence argument is the subterms. The result argument will create the AST node for this sequence. The input list matches the length of the sequence list.
The name argument is the human-language name for this term.
procedure
(choice options name) → (-> (list token) parser-result)
  options : (listof (-> (list token) parser-result))
  name : string?
The returned function selects between different terms, and produces an opaque value that interacts with other combinators
The argument options is the list of the possible terms. The argument name is the human-language name for this term
procedure
(repeat term) → (-> (list token) parser-result)
  term : (-> (list token) parser-result)
The returned function accepts 0 or more instances of term, and produces an opaque value that interacts with other combinators
procedure
(parser term)
→ (-> (list token) (or/c string? editor) (or/c AST err?))
  term : (-> (list token) parser-result)
Returns a function that parses a list of tokens, producing either the result of calling all appropriate result functions or an err
The location argument is either the string representing the file name or the editor being read, typically retrieved from file-path

struct
(struct err (msg src)
    #:extra-constructor-name make-err)
  msg : string?
  src : (list location integer? integer? integer? integer?)

The msg field contains the error message.

The src field contains the source location for the error and is suitable for calling raise-read-error.

syntax
(define-simple-terminals name (simple-spec ...))

simple-spec = id
| (id string)
| (id proc)
| (id string proc)

Expands to a define-empty-tokens and one terminal definition per simple-spec

The name identifier specifies a group of tokens.

The ids specify tokens or terminals with no value. If provided, proc should be a procedure from tokens to AST nodes. By default, the identity function is used. The token will be a symbol. If provided, string is the human-language name for the terminal, name is used by default

syntax
(define-terminals name (terminal-spec ...))

terminal-spec = (id proc)
| (id string proc)

Like define-simple-terminals, except uses define-tokens.

If provided, proc should be a procedure from tokens to AST node. The token will be the token defined as name and will be a value token.

syntax
(sequence (name ...) proc string)

Generates a call to seq with the specified names in a list, proc => result and string => name. The name can be omitted when nested in another sequence or choose

If name is of the form (^ id), it identifies a parser production that can be used to identify this production in an error message. Otherwise the same as above

syntax
(choose (name ...) string)

Generates a call to choice using the given terms as the list of options, string => name. The name can be omitted when nested in another sequence or choose

syntax
(eta name)

Eta expands name with a wrapping that properly mimcs a parser term

signature
error-format-parameters^ : signature

This signature requires five names:
src?: boolean? - will the lexer include source information
input-type: string? - used to identify the source of input
show-options: boolean? - presently ignored
max-depth: integer? - The depth of errors reported
max-choice-depth: integer? - The max number of options listed in an error

signature
language-format-parameters^ : signature

This signature requires two names:
class-type: string? - general term for language keywords
input->output-name: (-> token? string?) - translates tokens into strings

signature
language-dictionary^ : signature

This signature requires three names:
misspelled: (-> string? string? number?) - check the spelling of the second arg against the first, return a number that is the probability that the second is a misspelling of the first
misscap: (-> string? string? boolean?) - check the capitalization of the second arg against the first
missclass: (-> string? string? boolean?) - check if the second arg names a correct token kind