Lexers
This manual documents the public APIs in the lexers packages.
The library currently provides reusable lexers for multiple applications. Syntax coloring is the first intended application, but the lexer APIs are also designed to support other consumers.
1 Overview
The public language modules currently available are:
Each language module currently exposes two related kinds of API:
A projected token API intended for general consumers such as syntax coloring.
A derived-token API intended for richer language-specific inspection and testing.
The projected APIs are intentionally close to parser-tools/lex. They return bare symbols, token? values, and optional position-token? wrappers built from the actual parser-tools/lex structures, so existing parser-oriented tools can consume them more easily.
The current profile split is:
'coloring —
keeps trivia, emits 'unknown for recoverable malformed input, and includes source positions by default. 'compiler —
skips trivia by default, raises on malformed input, and includes source positions by default.
Across languages, the projected lexer constructors return one-argument port readers. Create the lexer once, call it repeatedly on the same input port, and stop when the result is an end-of-file token. The projected category symbols themselves, such as 'identifier, 'literal, and 'keyword, are intended to be the stable public API.
1.1 Token Helpers
The helper module lexers/token provides a small public API for inspecting wrapped or unwrapped projected token values without reaching directly into parser-tools/lex.
| (require lexers/token) | package: lexers-lib |
procedure
(lexer-token-name token) → symbol?
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-value token) → any/c
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-has-positions? token) → boolean?
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-start token) → (or/c position? #f)
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-end token) → (or/c position? #f)
token : (or/c symbol? token? position-token?)
procedure
(lexer-token-eof? token) → boolean?
token : (or/c symbol? token? position-token?)
1.2 Profiles
The public projected APIs currently support the same profile names:
'coloring
'compiler
The current defaults are:
Profile |
| Trivia |
| Source Positions |
| Malformed Input |
'coloring |
| 'keep |
| #t |
| emit unknown tokens |
'compiler |
| 'skip |
| #t |
| raise an exception |
For the keyword arguments accepted by make-css-lexer, css-string->tokens, make-html-lexer, html-string->tokens, make-javascript-lexer, javascript-string->tokens, make-markdown-lexer, markdown-string->tokens, make-racket-lexer, racket-string->tokens, make-scribble-lexer, scribble-string->tokens, make-wat-lexer, and wat-string->tokens:
#:profile selects the named default bundle.
#:trivia 'profile-default means “use the trivia policy from the selected profile”.
#:source-positions 'profile-default means “use the source-position setting from the selected profile”.
An explicit #:trivia or #:source-positions value overrides the selected profile default.
2 CSS
| (require lexers/css) | package: lexers-lib |
The projected CSS API has two entry points:
make-css-lexer for streaming tokenization from an input port.
css-string->tokens for eager tokenization of an entire string.
procedure
(make-css-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token? whose payload is either a bare symbol such as 'eof or a token? carrying a projected category such as 'identifier, 'literal, 'comment, or 'unknown.
When #:source-positions is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-css-lexer #:profile 'coloring))
> (define in (open-input-string "color: #fff;")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'identifier "color") (position 1 1 0) (position 6 1 5))
(position-token (token 'delimiter ":") (position 6 1 5) (position 7 1 6))
(position-token (token 'whitespace " ") (position 7 1 6) (position 8 1 7))
(position-token (token 'literal "#fff") (position 8 1 7) (position 12 1 11)))
procedure
(css-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-css-lexer. It opens a string port, enables line counting, repeatedly calls the port-based lexer until end-of-file, and returns the resulting token list.
2.1 CSS Returned Tokens
The projected CSS API returns values in the same general shape as parser-tools/lex:
The end of input is reported as 'eof, either directly or inside a position-token?.
Most ordinary results are token? values whose token-name is a projected category and whose token-value contains language-specific text or metadata.
When #:source-positions is true, each result is wrapped in a position-token?.
When #:source-positions is false, results are returned without that outer wrapper.
Common projected CSS categories include:
'whitespace
'comment
'identifier
'literal
'delimiter
'unknown
'eof
In 'coloring mode, whitespace and comments are kept, and recoverable malformed input is returned as 'unknown. In 'compiler mode, whitespace and comments are skipped by default, and malformed input raises an exception instead of producing an 'unknown token.
For the current CSS scaffold, token-value normally preserves the original source text of the emitted token. In particular:
For 'identifier, the value is the matched identifier text, such as "color" or "--brand-color".
For 'literal, the value is the matched literal text, such as "#fff", "12px", "url(foo.png)", or "rgb(".
For 'comment and 'whitespace, the value is the original comment or whitespace text when those categories are kept.
For 'delimiter, the value is the matched delimiter text, such as ":", ";", or "{".
For 'unknown in tolerant mode, the value is the malformed input text that could not be accepted.
> (define inspect-lexer (make-css-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "color: #fff;")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'identifier
> (lexer-token-value first-token) "color"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 6
procedure
→ (input-port? . -> . (or/c 'eof css-derived-token?))
The result is a procedure of one argument, an input port. Each call reads the next raw CSS token from the port, computes its CSS-specific derived classifications, and returns one derived token value. At end of input, it returns 'eof.
The intended use is the same as for make-css-lexer: create the lexer once, then call it repeatedly on the same port until it returns 'eof.
> (define derived-lexer (make-css-derived-lexer))
> (define derived-in (open-input-string "color: #fff;")) > (port-count-lines! derived-in)
> (list (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in))
(list
(css-derived-token
(css-raw-token 'ident-token "color" (position 1 1 0) (position 6 1 5))
'(property-name-candidate selector-token))
(css-derived-token
(css-raw-token 'colon-token ":" (position 6 1 5) (position 7 1 6))
'())
(css-derived-token
(css-raw-token 'whitespace-token " " (position 7 1 6) (position 8 1 7))
'())
(css-derived-token
(css-raw-token 'hash-token "#fff" (position 8 1 7) (position 12 1 11))
'(color-literal selector-token)))
procedure
(css-string->derived-tokens source)
→ (listof css-derived-token?) source : string?
This is a convenience wrapper over make-css-derived-lexer. It opens a string port, enables line counting, repeatedly calls the derived lexer until it returns 'eof, and returns the resulting list of derived tokens.
procedure
(css-derived-token? v) → boolean?
v : any/c
procedure
(css-derived-token-tags token) → (listof symbol?)
token : css-derived-token?
procedure
(css-derived-token-has-tag? token tag) → boolean?
token : css-derived-token? tag : symbol?
procedure
(css-derived-token-text token) → string?
token : css-derived-token?
procedure
(css-derived-token-start token) → position?
token : css-derived-token?
procedure
(css-derived-token-end token) → position?
token : css-derived-token?
2.2 CSS Derived Tokens
A derived CSS token pairs one raw CSS token with a small list of CSS-specific classification tags. This layer is more precise than the projected consumer-facing categories and is meant for inspection, testing, and language-aware tools.
The current CSS scaffold may attach tags such as:
'at-rule-name
'color-literal
'color-function
'selector-token
'property-name
'declaration-value-token
'function-name
'gradient-function
'custom-property-name
'property-name-candidate
'string-literal
'numeric-literal
'length-dimension
'malformed-token
> (define derived-tokens (css-string->derived-tokens ".foo { color: red; background: rgb(1 2 3); }"))
> (map (lambda (token) (list (css-derived-token-text token) (css-derived-token-tags token) (css-derived-token-has-tag? token 'selector-token) (css-derived-token-has-tag? token 'property-name) (css-derived-token-has-tag? token 'declaration-value-token) (css-derived-token-has-tag? token 'color-literal) (css-derived-token-has-tag? token 'function-name) (css-derived-token-has-tag? token 'color-function) (css-derived-token-has-tag? token 'custom-property-name) (css-derived-token-has-tag? token 'string-literal) (css-derived-token-has-tag? token 'numeric-literal) (css-derived-token-has-tag? token 'length-dimension))) derived-tokens)
'(("." () #f #f #f #f #f #f #f #f #f #f)
("foo"
(property-name-candidate selector-token)
(selector-token)
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("color"
(property-name-candidate property-name)
#f
(property-name)
#f
#f
#f
#f
#f
#f
#f
#f)
(":" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("red"
(property-name-candidate declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("background"
(property-name-candidate property-name)
#f
(property-name)
#f
#f
#f
#f
#f
#f
#f
#f)
(":" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("rgb"
(function-name color-function declaration-value-token)
#f
#f
(declaration-value-token)
#f
(function-name color-function declaration-value-token)
(color-function declaration-value-token)
#f
#f
#f
#f)
("(" () #f #f #f #f #f #f #f #f #f #f)
("1"
(numeric-literal declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
(numeric-literal declaration-value-token)
#f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("2"
(numeric-literal declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
(numeric-literal declaration-value-token)
#f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("3"
(numeric-literal declaration-value-token)
#f
#f
(declaration-value-token)
#f
#f
#f
#f
#f
(numeric-literal declaration-value-token)
#f)
(")" () #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f))
value
css-profiles : immutable-hash?
3 HTML
| (require lexers/html) | package: lexers-lib |
The projected HTML API has two entry points:
make-html-lexer for streaming tokenization from an input port.
html-string->tokens for eager tokenization of an entire string.
procedure
(make-html-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
The projected HTML token stream includes ordinary markup tokens and inline delegated tokens from embedded <style> and <script> bodies.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
> (define lexer (make-html-lexer #:profile 'coloring))
> (define in (open-input-string "<section id=main>Hi</section>")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "<") (position 1 1 0) (position 2 1 1))
(position-token
(token 'identifier "section")
(position 2 1 1)
(position 9 1 8))
(position-token (token 'whitespace " ") (position 9 1 8) (position 10 1 9))
(position-token
(token 'identifier "id")
(position 10 1 9)
(position 12 1 11)))
procedure
(html-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-html-lexer.
3.1 HTML Returned Tokens
Common projected HTML categories include:
'comment
'keyword
'identifier
'literal
'operator
'delimiter
'unknown
'eof
For the current HTML scaffold:
tag names and attribute names project as 'identifier
attribute values, text nodes, entities, and delegated CSS/JS literals project as 'literal
punctuation such as <, </, >, />, and embedded interpolation boundaries project as 'delimiter or 'operator
comments project as 'comment
doctype/declaration markup projects as 'keyword
> (define inspect-lexer (make-html-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "<!doctype html><main id=\"app\">Hi & bye</main>")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'keyword
> (lexer-token-value first-token) "<!doctype html>"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 16
procedure
→ (input-port? . -> . (or/c 'eof html-derived-token?))
procedure
(html-string->derived-tokens source)
→ (listof html-derived-token?) source : string?
procedure
(html-derived-token? v) → boolean?
v : any/c
procedure
(html-derived-token-tags token) → (listof symbol?)
token : html-derived-token?
procedure
(html-derived-token-has-tag? token tag) → boolean?
token : html-derived-token? tag : symbol?
procedure
(html-derived-token-text token) → string?
token : html-derived-token?
procedure
(html-derived-token-start token) → position?
token : html-derived-token?
procedure
(html-derived-token-end token) → position?
token : html-derived-token?
3.2 HTML Derived Tokens
The current HTML scaffold may attach tags such as:
'html-tag-name
'html-closing-tag-name
'html-attribute-name
'html-attribute-value
'html-text
'html-entity
'html-doctype
'comment
'embedded-css
'embedded-javascript
'malformed-token
Delegated CSS and JavaScript body tokens keep their reusable semantic tags and gain an additional language marker such as 'embedded-css or 'embedded-javascript.
> (define derived-tokens (html-string->derived-tokens "<!doctype html><section id=main class=\"card\">Hi & bye<style>.hero { color: #c33; }</style><script>const root = document.querySelector(\"#app\");</script></section>"))
> (map (lambda (token) (list (html-derived-token-text token) (html-derived-token-tags token) (html-derived-token-has-tag? token 'html-tag-name) (html-derived-token-has-tag? token 'html-attribute-name) (html-derived-token-has-tag? token 'html-attribute-value) (html-derived-token-has-tag? token 'html-text) (html-derived-token-has-tag? token 'html-entity) (html-derived-token-has-tag? token 'embedded-css) (html-derived-token-has-tag? token 'embedded-javascript))) derived-tokens)
'(("<!doctype html>" (keyword html-doctype) #f #f #f #f #f #f #f)
("<" (delimiter) #f #f #f #f #f #f #f)
("section" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)
(" " (whitespace) #f #f #f #f #f #f #f)
("id"
(identifier html-attribute-name)
#f
(html-attribute-name)
#f
#f
#f
#f
#f)
("=" (operator) #f #f #f #f #f #f #f)
("main"
(literal html-attribute-value)
#f
#f
(html-attribute-value)
#f
#f
#f
#f)
(" " (whitespace) #f #f #f #f #f #f #f)
("class"
(identifier html-attribute-name)
#f
(html-attribute-name)
#f
#f
#f
#f
#f)
("=" (operator) #f #f #f #f #f #f #f)
("\"card\""
(html-attribute-value literal)
#f
#f
(html-attribute-value literal)
#f
#f
#f
#f)
(">" (delimiter) #f #f #f #f #f #f #f)
("Hi " (literal html-text) #f #f #f (html-text) #f #f #f)
("&" (literal html-entity) #f #f #f #f (html-entity) #f #f)
(" bye" (literal html-text) #f #f #f (html-text) #f #f #f)
("<" (delimiter) #f #f #f #f #f #f #f)
("style" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("." (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
("hero"
(embedded-css identifier property-name-candidate selector-token)
#f
#f
#f
#f
#f
(embedded-css identifier property-name-candidate selector-token)
#f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("{" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("color"
(embedded-css identifier property-name-candidate property-name)
#f
#f
#f
#f
#f
(embedded-css identifier property-name-candidate property-name)
#f)
(":" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("#c33"
(embedded-css literal color-literal declaration-value-token)
#f
#f
#f
#f
#f
(embedded-css literal color-literal declaration-value-token)
#f)
(";" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
(" " (embedded-css whitespace) #f #f #f #f #f (embedded-css whitespace) #f)
("}" (embedded-css delimiter) #f #f #f #f #f (embedded-css delimiter) #f)
("</" (delimiter) #f #f #f #f #f #f #f)
("style" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("<" (delimiter) #f #f #f #f #f #f #f)
("script" (identifier html-tag-name) (html-tag-name) #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("const"
(embedded-javascript keyword)
#f
#f
#f
#f
#f
#f
(embedded-javascript keyword))
(" "
(embedded-javascript whitespace)
#f
#f
#f
#f
#f
#f
(embedded-javascript whitespace))
("root"
(embedded-javascript identifier declaration-name)
#f
#f
#f
#f
#f
#f
(embedded-javascript identifier declaration-name))
(" "
(embedded-javascript whitespace)
#f
#f
#f
#f
#f
#f
(embedded-javascript whitespace))
("="
(embedded-javascript operator)
#f
#f
#f
#f
#f
#f
(embedded-javascript operator))
(" "
(embedded-javascript whitespace)
#f
#f
#f
#f
#f
#f
(embedded-javascript whitespace))
("document"
(embedded-javascript identifier)
#f
#f
#f
#f
#f
#f
(embedded-javascript identifier))
("."
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
("querySelector"
(embedded-javascript identifier method-name property-name)
#f
#f
#f
#f
#f
#f
(embedded-javascript identifier method-name property-name))
("("
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
("\"#app\""
(embedded-javascript literal string-literal)
#f
#f
#f
#f
#f
#f
(embedded-javascript literal string-literal))
(")"
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
(";"
(embedded-javascript delimiter)
#f
#f
#f
#f
#f
#f
(embedded-javascript delimiter))
("</" (delimiter) #f #f #f #f #f #f #f)
("script" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f)
("</" (delimiter) #f #f #f #f #f #f #f)
("section" (identifier html-closing-tag-name) #f #f #f #f #f #f #f)
(">" (delimiter) #f #f #f #f #f #f #f))
value
html-profiles : immutable-hash?
4 Markdown
| (require lexers/markdown) | package: lexers-lib |
The projected Markdown API has two entry points:
make-markdown-lexer for streaming tokenization from an input port.
markdown-string->tokens for eager tokenization of an entire string.
The first Markdown implementation is a handwritten, parser-lite, GitHub-flavored Markdown lexer. It is line-oriented and can delegate raw HTML and known fenced-code languages to the existing HTML, CSS, JavaScript, Racket, Scribble, and WAT lexers.
procedure
(make-markdown-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next projected Markdown token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-markdown-lexer #:profile 'coloring))
> (define in (open-input-string "# Title\n\n```js\nconst x = 1;\n```\n")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "#") (position 1 1 0) (position 2 1 1))
(position-token (token 'whitespace " ") (position 2 1 1) (position 3 1 2))
(position-token (token 'literal "Title") (position 3 1 2) (position 8 1 7))
(position-token (token 'whitespace "\n") (position 8 1 7) (position 9 2 0)))
procedure
(markdown-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-markdown-lexer.
4.1 Markdown Returned Tokens
Common projected Markdown categories include:
'whitespace
'identifier
'literal
'keyword
'operator
'delimiter
'comment
'unknown
'eof
For the current Markdown scaffold:
ordinary prose, inline code text, code-block text, and link or image payload text project mostly as 'literal
language names and delegated name-like tokens project as 'identifier or 'keyword, depending on the delegated lexer
structural markers such as heading markers, list markers, brackets, pipes, backticks, and fence delimiters project as 'delimiter
comments only appear through delegated embedded HTML
recoverable malformed constructs project as 'unknown in 'coloring mode and raise in 'compiler mode
For source continuity, the derived Markdown stream preserves the newline after a fenced-code info string as an explicit whitespace token before the code body. Incomplete fenced-code blocks are tokenized best-effort instead of raising an internal error.
> (define inspect-lexer (make-markdown-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "# Title\n\nText with <span class=\"x\">hi</span>\n")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'delimiter
> (lexer-token-value first-token) "#"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 2
procedure
→ (input-port? . -> . (or/c 'eof markdown-derived-token?))
procedure
(markdown-string->derived-tokens source)
→ (listof markdown-derived-token?) source : string?
procedure
v : any/c
procedure
(markdown-derived-token-tags token) → (listof symbol?)
token : markdown-derived-token?
procedure
(markdown-derived-token-has-tag? token tag) → boolean?
token : markdown-derived-token? tag : symbol?
procedure
(markdown-derived-token-text token) → string?
token : markdown-derived-token?
procedure
(markdown-derived-token-start token) → position?
token : markdown-derived-token?
procedure
(markdown-derived-token-end token) → position?
token : markdown-derived-token?
4.2 Markdown Derived Tokens
The current Markdown scaffold may attach tags such as:
'markdown-text
'markdown-heading-marker
'markdown-heading-text
'markdown-blockquote-marker
'markdown-list-marker
'markdown-task-marker
'markdown-thematic-break
'markdown-code-span
'markdown-code-fence
'markdown-code-block
'markdown-code-info-string
'markdown-emphasis-delimiter
'markdown-strong-delimiter
'markdown-strikethrough-delimiter
'markdown-link-text
'markdown-link-destination
'markdown-link-title
'markdown-image-marker
'markdown-autolink
'markdown-table-pipe
'markdown-table-alignment
'markdown-table-cell
'markdown-escape
'markdown-hard-line-break
'embedded-html
'embedded-css
'embedded-javascript
'embedded-racket
'embedded-scribble
'embedded-wat
'malformed-token
Delegated raw HTML and recognized fenced-code languages keep their reusable derived tags and gain Markdown embedding markers such as 'embedded-html, 'embedded-javascript, 'embedded-racket, or 'embedded-wat.
> (define derived-tokens (markdown-string->derived-tokens "# Title\n\n- [x] done\n\n```js\nconst x = 1;\n```\n\nText <span class=\"x\">hi</span>\n"))
> (map (lambda (token) (list (markdown-derived-token-text token) (markdown-derived-token-tags token))) derived-tokens)
'(("#" (delimiter markdown-heading-marker))
(" " (whitespace))
("Title" (literal markdown-heading-text))
("\n" (whitespace))
("\n" (whitespace))
("-" (delimiter markdown-list-marker))
(" " (whitespace))
("[x]" (delimiter markdown-task-marker))
(" " (whitespace))
("done" (literal markdown-text))
("\n" (whitespace))
("\n" (whitespace))
("```" (delimiter markdown-code-fence))
("js" (identifier markdown-code-info-string))
("\n" (whitespace))
("const" (keyword embedded-javascript markdown-code-block))
(" " (whitespace embedded-javascript markdown-code-block))
("x" (identifier declaration-name embedded-javascript markdown-code-block))
(" " (whitespace embedded-javascript markdown-code-block))
("=" (operator embedded-javascript markdown-code-block))
(" " (whitespace embedded-javascript markdown-code-block))
("1" (literal numeric-literal embedded-javascript markdown-code-block))
(";" (delimiter embedded-javascript markdown-code-block))
("\n" (whitespace embedded-javascript markdown-code-block))
("```" (delimiter markdown-code-fence))
("\n" (whitespace))
("\n" (whitespace))
("Text " (literal markdown-text))
("<" (delimiter embedded-html))
("span" (identifier html-tag-name embedded-html))
(" " (whitespace embedded-html))
("class" (identifier html-attribute-name embedded-html))
("=" (operator embedded-html))
("\"x\"" (literal html-attribute-value embedded-html))
(">" (delimiter embedded-html))
("hi" (literal html-text embedded-html))
("</" (delimiter embedded-html))
("span" (identifier html-closing-tag-name embedded-html))
(">" (delimiter embedded-html))
("\n" (whitespace)))
value
markdown-profiles : immutable-hash?
5 WAT
| (require lexers/wat) | package: lexers-lib |
The projected WAT API has two entry points:
make-wat-lexer for streaming tokenization from an input port.
wat-string->tokens for eager tokenization of an entire string.
The first WAT implementation is a handwritten lexer for WebAssembly text format. It targets WAT only, not binary .wasm files.
procedure
(make-wat-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next projected WAT token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
The streaming port readers emit tokens incrementally. They do not buffer the entire remaining input before producing the first token.
> (define lexer (make-wat-lexer #:profile 'coloring))
> (define in (open-input-string "(module (func (result i32) (i32.const 42)))")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "(") (position 1 1 0) (position 2 1 1))
(position-token (token 'keyword "module") (position 2 1 1) (position 8 1 7))
(position-token (token 'whitespace " ") (position 8 1 7) (position 9 1 8))
(position-token (token 'delimiter "(") (position 9 1 8) (position 10 1 9)))
procedure
(wat-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-wat-lexer.
5.1 WAT Returned Tokens
Common projected WAT categories include:
'whitespace
'comment
'identifier
'keyword
'literal
'delimiter
'unknown
'eof
For the current WAT scaffold:
form names, type names, and instruction names project as 'keyword
$-prefixed names and remaining word-like names project as 'identifier
strings and numeric literals project as 'literal
parentheses project as 'delimiter
comments project as 'comment
malformed input projects as 'unknown in 'coloring mode and raises in 'compiler mode
Projected and derived token text preserve the exact source slice, including whitespace and comments.
> (define inspect-lexer (make-wat-lexer #:profile 'coloring))
> (define inspect-in (open-input-string ";; line comment\n(module (func (result i32) (i32.const 42)))")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'comment
> (lexer-token-value first-token) ";; line comment"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 16
procedure
→ (input-port? . -> . (or/c 'eof wat-derived-token?))
procedure
(wat-string->derived-tokens source)
→ (listof wat-derived-token?) source : string?
procedure
(wat-derived-token? v) → boolean?
v : any/c
procedure
(wat-derived-token-tags token) → (listof symbol?)
token : wat-derived-token?
procedure
(wat-derived-token-has-tag? token tag) → boolean?
token : wat-derived-token? tag : symbol?
procedure
(wat-derived-token-text token) → string?
token : wat-derived-token?
procedure
(wat-derived-token-start token) → position?
token : wat-derived-token?
procedure
(wat-derived-token-end token) → position?
token : wat-derived-token?
5.2 WAT Derived Tokens
The current WAT scaffold may attach tags such as:
'wat-form
'wat-type
'wat-instruction
'wat-identifier
'wat-string-literal
'wat-numeric-literal
'comment
'whitespace
'malformed-token
> (define derived-tokens (wat-string->derived-tokens "(module (func $answer (result i32) i32.const 42))"))
> (map (lambda (token) (list (wat-derived-token-text token) (wat-derived-token-tags token))) derived-tokens)
'(("(" (delimiter))
("module" (keyword wat-form))
(" " (whitespace))
("(" (delimiter))
("func" (keyword wat-form))
(" " (whitespace))
("$answer" (identifier wat-identifier))
(" " (whitespace))
("(" (delimiter))
("result" (keyword wat-form))
(" " (whitespace))
("i32" (keyword wat-type))
(")" (delimiter))
(" " (whitespace))
("i32.const" (keyword wat-instruction))
(" " (whitespace))
("42" (literal wat-numeric-literal))
(")" (delimiter))
(")" (delimiter)))
value
wat-profiles : immutable-hash?
6 Racket
| (require lexers/racket) | package: lexers-lib |
The projected Racket API has two entry points:
make-racket-lexer for streaming tokenization from an input port.
racket-string->tokens for eager tokenization of an entire string.
This lexer is adapter-backed. It uses the lexer from syntax-color/racket-lexer as its raw engine and adapts that output into the public lexers projected and derived APIs.
When a source starts with "#lang at-exp", the adapter switches to the Scribble lexer family in Racket mode so that @litchar["@"] forms are tokenized as Scribble escapes instead of ordinary symbol text.
procedure
(make-racket-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-racket-lexer #:profile 'coloring))
> (define in (open-input-string "#:x \"hi\"")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'literal "#:x") (position 1 1 0) (position 4 1 3))
(position-token (token 'whitespace " ") (position 4 1 3) (position 5 1 4))
(position-token (token 'literal "\"hi\"") (position 5 1 4) (position 9 1 8)))
procedure
(racket-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-racket-lexer.
6.1 Racket Returned Tokens
Common projected Racket categories include:
'whitespace
'comment
'identifier
'literal
'delimiter
'unknown
'eof
For the current adapter:
comments and sexp comments project as 'comment
whitespace projects as 'whitespace
strings, constants, and hash-colon keywords project as 'literal
symbols, other, and no-color tokens project as 'identifier
parentheses project as 'delimiter
lexical errors project as 'unknown in 'coloring mode and raise in 'compiler mode
Projected and derived Racket token text preserve the exact consumed source slice, including multi-semicolon comment headers such as ;;;.
> (define inspect-lexer (make-racket-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "#;(+ 1 2) #:x")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'comment
> (lexer-token-value first-token) "#;"
procedure
→ (input-port? . -> . (or/c 'eof racket-derived-token?))
procedure
(racket-string->derived-tokens source)
→ (listof racket-derived-token?) source : string?
procedure
v : any/c
procedure
(racket-derived-token-tags token) → (listof symbol?)
token : racket-derived-token?
procedure
(racket-derived-token-has-tag? token tag) → boolean?
token : racket-derived-token? tag : symbol?
procedure
(racket-derived-token-text token) → string?
token : racket-derived-token?
procedure
(racket-derived-token-start token) → position?
token : racket-derived-token?
procedure
(racket-derived-token-end token) → position?
token : racket-derived-token?
6.2 Racket Derived Tokens
The current Racket adapter may attach tags such as:
'racket-comment
'racket-sexp-comment
'racket-whitespace
'racket-constant
'racket-string
'racket-symbol
'racket-parenthesis
'racket-hash-colon-keyword
'racket-commented-out
'racket-datum
'racket-open
'racket-close
'racket-continue
'racket-usual-special-form
'racket-definition-form
'racket-binding-form
'racket-conditional-form
'racket-error
'scribble-text for "#lang at-exp" text regions
'scribble-command-char for @litchar["@"] in "#lang at-exp" sources
'scribble-command for command names such as @litchar["@"]bold in "#lang at-exp" sources
'scribble-body-delimiter
'scribble-optional-delimiter
'scribble-racket-escape
The ‘usual special form‘ tags are heuristic. They are meant to help ordinary Racket tooling recognize common built-in forms such as define, define-values, if, and let, but they are not guarantees about expanded meaning. In particular, a token whose text is "define" may still receive 'racket-usual-special-form even in a program where define has been rebound, because the lexer does not perform expansion or binding resolution.
> (define derived-tokens (racket-string->derived-tokens "#;(+ 1 2) #:x \"hi\""))
> (map (lambda (token) (list (racket-derived-token-text token) (racket-derived-token-tags token))) derived-tokens)
'(("#;" (comment racket-sexp-comment racket-continue))
("(" (delimiter racket-parenthesis racket-open comment racket-commented-out))
("+" (identifier racket-symbol racket-datum comment racket-commented-out))
(" "
(whitespace racket-whitespace racket-continue comment racket-commented-out))
("1" (literal racket-constant racket-datum comment racket-commented-out))
(" "
(whitespace racket-whitespace racket-continue comment racket-commented-out))
("2" (literal racket-constant racket-datum comment racket-commented-out))
(")"
(delimiter racket-parenthesis racket-close comment racket-commented-out))
(" " (whitespace racket-whitespace racket-continue))
("#:x" (literal racket-hash-colon-keyword racket-datum))
(" " (whitespace racket-whitespace racket-continue))
("\"hi\"" (literal racket-string racket-datum)))
> (define at-exp-derived-tokens (racket-string->derived-tokens "#lang at-exp racket\n(define x @bold{hi})\n"))
> (map (lambda (token) (list (racket-derived-token-text token) (racket-derived-token-tags token))) at-exp-derived-tokens)
'(("#lang at-exp" (identifier racket-other racket-datum))
(" " (whitespace racket-whitespace racket-continue))
("racket" (identifier racket-symbol racket-datum))
("\n" (whitespace racket-whitespace racket-continue))
("(" (delimiter racket-parenthesis racket-open))
("define"
(identifier
racket-symbol
racket-datum
racket-usual-special-form
racket-definition-form))
(" " (whitespace racket-whitespace racket-continue))
("x" (identifier racket-symbol racket-datum))
(" " (whitespace racket-whitespace racket-continue))
("@" (delimiter racket-parenthesis racket-datum scribble-command-char))
("bold" (identifier racket-symbol racket-datum scribble-command))
("{" (delimiter racket-parenthesis racket-open scribble-body-delimiter))
("hi" (literal scribble-text racket-continue))
("}" (delimiter racket-parenthesis racket-close scribble-body-delimiter))
(")" (delimiter racket-parenthesis racket-close))
("\n" (whitespace racket-whitespace racket-continue)))
value
racket-profiles : immutable-hash?
7 Scribble
| (require lexers/scribble) | package: lexers-lib |
The projected Scribble API has two entry points:
make-scribble-lexer for streaming tokenization from an input port.
scribble-string->tokens for eager tokenization of an entire string.
This lexer is adapter-backed. It uses syntax-color/scribble-lexer as its raw engine and adapts that output into the public lexers projected and derived APIs.
The first implementation defaults to Scribble’s inside/text mode via make-scribble-inside-lexer. Command-character customization is intentionally deferred.
procedure
(make-scribble-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token?. When it is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
> (define lexer (make-scribble-lexer #:profile 'coloring))
> (define in (open-input-string "@title{Hi}\nText")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'delimiter "@") (position 1 1 0) (position 2 1 1))
(position-token (token 'identifier "title") (position 2 1 1) (position 7 1 6))
(position-token (token 'delimiter "{") (position 7 1 6) (position 8 1 7))
(position-token (token 'literal "Hi") (position 8 1 7) (position 10 1 9)))
procedure
(scribble-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default
This is a convenience wrapper over make-scribble-lexer.
7.1 Scribble Returned Tokens
Common projected Scribble categories include:
'whitespace
'comment
'identifier
'literal
'delimiter
'unknown
'eof
For the current adapter:
text, strings, and constants project as 'literal
whitespace projects as 'whitespace
symbol and other tokens project as 'identifier
parentheses, the command character, and body or optional delimiters project as 'delimiter
lexical errors project as 'unknown in 'coloring mode and raise in 'compiler mode
For source fidelity, the Scribble adapter preserves the exact source slice for projected and derived token text, including whitespace spans that contain one or more newlines.
> (define inspect-lexer (make-scribble-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "@title{Hi}")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'delimiter
> (lexer-token-value first-token) "@"
procedure
→ (input-port? . -> . (or/c 'eof scribble-derived-token?))
procedure
(scribble-string->derived-tokens source)
→ (listof scribble-derived-token?) source : string?
procedure
v : any/c
procedure
(scribble-derived-token-tags token) → (listof symbol?)
token : scribble-derived-token?
procedure
(scribble-derived-token-has-tag? token tag) → boolean?
token : scribble-derived-token? tag : symbol?
procedure
(scribble-derived-token-text token) → string?
token : scribble-derived-token?
procedure
(scribble-derived-token-start token) → position?
token : scribble-derived-token?
procedure
(scribble-derived-token-end token) → position?
token : scribble-derived-token?
7.2 Scribble Derived Tokens
The current Scribble adapter may attach tags such as:
'scribble-comment
'scribble-whitespace
'scribble-text
'scribble-string
'scribble-constant
'scribble-symbol
'scribble-parenthesis
'scribble-other
'scribble-error
'scribble-command
'scribble-command-char
'scribble-body-delimiter
'scribble-optional-delimiter
'scribble-racket-escape
These tags describe reusable Scribble structure, not presentation. In particular, 'scribble-command only means that a symbol-like token is being used as a command name after "@". It does not mean the lexer has inferred higher-level document semantics for commands such as title or itemlist.
> (define derived-tokens (scribble-string->derived-tokens "@title{Hi}\n@racket[(define x 1)]"))
> (map (lambda (token) (list (scribble-derived-token-text token) (scribble-derived-token-tags token))) derived-tokens)
'(("@" (delimiter scribble-parenthesis scribble-command-char))
("title" (identifier scribble-symbol scribble-command))
("{" (delimiter scribble-parenthesis scribble-body-delimiter))
("Hi" (literal scribble-text))
("}" (delimiter scribble-parenthesis scribble-body-delimiter))
("\n" (whitespace scribble-whitespace))
("@" (delimiter scribble-parenthesis scribble-command-char))
("racket" (identifier scribble-symbol scribble-command))
("[" (delimiter scribble-parenthesis scribble-optional-delimiter))
("(" (delimiter scribble-parenthesis scribble-racket-escape))
("define" (scribble-racket-escape))
(" " (whitespace scribble-whitespace scribble-racket-escape))
("x" (identifier scribble-symbol scribble-racket-escape))
(" " (whitespace scribble-whitespace scribble-racket-escape))
("1" (literal scribble-constant scribble-racket-escape))
(")" (delimiter scribble-parenthesis scribble-racket-escape))
("]"
(delimiter
scribble-parenthesis
scribble-optional-delimiter
scribble-racket-escape)))
value
scribble-profiles : immutable-hash?
8 JavaScript
| (require lexers/javascript) | package: lexers-lib |
The projected JavaScript API has two entry points:
make-javascript-lexer for streaming tokenization from an input port.
javascript-string->tokens for eager tokenization of an entire string.
procedure
(make-javascript-lexer [ #:profile profile #:trivia trivia #:source-positions source-positions #:jsx? jsx?]) → (input-port? . -> . (or/c symbol? token? position-token?)) profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default jsx? : boolean? = #f
The result is a procedure of one argument, an input port. Each call reads the next token from the port and returns one projected token value.
When #:source-positions is true, each result is a position-token? whose payload is either a bare symbol such as 'eof or a token? carrying a projected category such as 'keyword, 'identifier, 'literal, 'operator, 'comment, or 'unknown.
When #:source-positions is false, the result is either a bare symbol or a token? directly.
The intended use is to create the lexer once, then call it repeatedly on the same port until it returns an end-of-file token.
When #:jsx? is true, the lexer accepts a small JSX extension inside JavaScript expressions. The projected token categories remain the same, while the derived-token API exposes JSX-specific structure.
> (define lexer (make-javascript-lexer #:profile 'coloring))
> (define in (open-input-string "const x = 1;")) > (port-count-lines! in)
> (list (lexer in) (lexer in) (lexer in) (lexer in))
(list
(position-token (token 'keyword "const") (position 1 1 0) (position 6 1 5))
(position-token (token 'whitespace " ") (position 6 1 5) (position 7 1 6))
(position-token (token 'identifier "x") (position 7 1 6) (position 8 1 7))
(position-token (token 'whitespace " ") (position 8 1 7) (position 9 1 8)))
procedure
(javascript-string->tokens source [ #:profile profile #:trivia trivia #:source-positions source-positions #:jsx? jsx?]) → (listof (or/c symbol? token? position-token?)) source : string? profile : (or/c 'coloring 'compiler) = 'coloring
trivia : (or/c 'profile-default 'keep 'skip) = 'profile-default
source-positions : (or/c 'profile-default boolean?) = 'profile-default jsx? : boolean? = #f
This is a convenience wrapper over make-javascript-lexer. It opens a string port, enables line counting, repeatedly calls the port-based lexer until end-of-file, and returns the resulting token list.
8.1 JavaScript Returned Tokens
The projected JavaScript API uses the same output shape:
The end of input is reported as 'eof, either directly or inside a position-token?.
Ordinary results are usually token? values whose token-name is a projected category and whose token-value contains language-specific text or metadata.
When #:source-positions is true, each result is wrapped in a position-token?.
When #:source-positions is false, results are returned without that outer wrapper.
Common projected JavaScript categories include:
'whitespace
'comment
'keyword
'identifier
'literal
'operator
'delimiter
'unknown
'eof
In 'coloring mode, whitespace and comments are kept, and recoverable malformed input is returned as 'unknown. In 'compiler mode, whitespace and comments are skipped by default, and malformed input raises an exception instead of producing an 'unknown token.
For the current JavaScript scaffold, token-value also preserves the original source text of the emitted token. In particular:
For 'keyword and 'identifier, the value is the matched identifier text, such as "const" or "name".
For 'literal, the value is the matched literal text, such as "1" or "\"hello\"".
For 'comment and 'whitespace, the value is the original comment or whitespace text when those categories are kept.
For 'operator and 'delimiter, the value is the matched character text, such as "=", ";", or "(".
For 'unknown in tolerant mode, the value is the malformed input text that could not be accepted.
> (define inspect-lexer (make-javascript-lexer #:profile 'coloring))
> (define inspect-in (open-input-string "const x = 1;")) > (port-count-lines! inspect-in)
> (define first-token (inspect-lexer inspect-in)) > (lexer-token-has-positions? first-token) #t
> (lexer-token-name first-token) 'keyword
> (lexer-token-value first-token) "const"
> (position-offset (lexer-token-start first-token)) 1
> (position-offset (lexer-token-end first-token)) 6
procedure
(make-javascript-derived-lexer [#:jsx? jsx?])
→ (input-port? . -> . (or/c 'eof javascript-derived-token?)) jsx? : boolean? = #f
The result is a procedure of one argument, an input port. Each call reads the next raw JavaScript token from the port, computes its JavaScript-specific derived classifications, and returns one derived token value. At end of input, it returns 'eof.
The intended use is the same as for make-javascript-lexer: create the lexer once, then call it repeatedly on the same port until it returns 'eof.
> (define derived-lexer (make-javascript-derived-lexer))
> (define derived-in (open-input-string "const x = 1;")) > (port-count-lines! derived-in)
> (list (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in) (derived-lexer derived-in))
(list
(javascript-derived-token
(javascript-raw-token
'identifier-token
"const"
(position 1 1 0)
(position 6 1 5))
'(keyword))
(javascript-derived-token
(javascript-raw-token
'whitespace-token
" "
(position 6 1 5)
(position 7 1 6))
'())
(javascript-derived-token
(javascript-raw-token
'identifier-token
"x"
(position 7 1 6)
(position 8 1 7))
'(identifier declaration-name))
(javascript-derived-token
(javascript-raw-token
'whitespace-token
" "
(position 8 1 7)
(position 9 1 8))
'()))
procedure
(javascript-string->derived-tokens source [ #:jsx? jsx?]) → (listof javascript-derived-token?) source : string? jsx? : boolean? = #f
This is a convenience wrapper over make-javascript-derived-lexer. It opens a string port, enables line counting, repeatedly calls the derived lexer until it returns 'eof, and returns the resulting list of derived tokens.
procedure
v : any/c
procedure
(javascript-derived-token-tags token) → (listof symbol?)
token : javascript-derived-token?
procedure
(javascript-derived-token-has-tag? token tag) → boolean? token : javascript-derived-token? tag : symbol?
procedure
(javascript-derived-token-text token) → string?
token : javascript-derived-token?
procedure
(javascript-derived-token-start token) → position?
token : javascript-derived-token?
procedure
(javascript-derived-token-end token) → position?
token : javascript-derived-token?
8.2 JavaScript Derived Tokens
A derived JavaScript token pairs one raw JavaScript token with a small list of JavaScript-specific classification tags. This layer is more precise than the projected consumer-facing categories and is meant for inspection, testing, and language-aware tools.
The current JavaScript scaffold may attach tags such as:
'keyword
'identifier
'declaration-name
'parameter-name
'object-key
'property-name
'method-name
'private-name
'static-keyword-usage
'string-literal
'numeric-literal
'regex-literal
'template-literal
'template-chunk
'template-interpolation-boundary
'jsx-tag-name
'jsx-closing-tag-name
'jsx-attribute-name
'jsx-text
'jsx-interpolation-boundary
'jsx-fragment-boundary
'comment
'malformed-token
> (define derived-tokens (javascript-string->derived-tokens "class Box { static create() { return this.value; } #secret = 1; }\nfunction wrap(name) { return name; }\nconst item = obj.run();\nconst data = { answer: 42 };\nconst greeting = `a ${name} b`;\nreturn /ab+c/i;"))
> (map (lambda (token) (list (javascript-derived-token-text token) (javascript-derived-token-tags token) (javascript-derived-token-has-tag? token 'keyword) (javascript-derived-token-has-tag? token 'identifier) (javascript-derived-token-has-tag? token 'declaration-name) (javascript-derived-token-has-tag? token 'parameter-name) (javascript-derived-token-has-tag? token 'object-key) (javascript-derived-token-has-tag? token 'property-name) (javascript-derived-token-has-tag? token 'method-name) (javascript-derived-token-has-tag? token 'private-name) (javascript-derived-token-has-tag? token 'static-keyword-usage) (javascript-derived-token-has-tag? token 'numeric-literal) (javascript-derived-token-has-tag? token 'regex-literal) (javascript-derived-token-has-tag? token 'template-literal) (javascript-derived-token-has-tag? token 'template-chunk) (javascript-derived-token-has-tag? token 'template-interpolation-boundary))) derived-tokens)
'(("class" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("Box"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("static"
(keyword static-keyword-usage)
(keyword static-keyword-usage)
#f
#f
#f
#f
#f
#f
#f
(static-keyword-usage)
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("create" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("this" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
("." () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("value"
(identifier property-name)
#f
(identifier property-name)
#f
#f
#f
(property-name)
#f
#f
#f
#f
#f
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("#secret"
(private-name)
#f
#f
#f
#f
#f
#f
#f
(private-name)
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("1"
(numeric-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
(numeric-literal)
#f
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("function" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("wrap"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("name"
(identifier parameter-name)
#f
(identifier parameter-name)
#f
(parameter-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("name" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("item"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("obj" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
("." () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("run"
(identifier method-name property-name)
#f
(identifier method-name property-name)
#f
#f
#f
(property-name)
(method-name property-name)
#f
#f
#f
#f
#f
#f
#f)
("(" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(")" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("data"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("{" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("answer"
(identifier object-key)
#f
(identifier object-key)
#f
#f
(object-key)
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(":" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("42"
(numeric-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
(numeric-literal)
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("}" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("const" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("greeting"
(identifier declaration-name)
#f
(identifier declaration-name)
(declaration-name)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("`"
(template-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal)
#f
#f)
("a "
(template-literal template-chunk)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-chunk)
(template-chunk)
#f)
("${"
(template-literal template-interpolation-boundary)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-interpolation-boundary)
#f
(template-interpolation-boundary))
("name" (identifier) #f (identifier) #f #f #f #f #f #f #f #f #f #f #f #f)
("}"
(template-literal template-interpolation-boundary)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-interpolation-boundary)
#f
(template-interpolation-boundary))
(" b"
(template-literal template-chunk)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal template-chunk)
(template-chunk)
#f)
("`"
(template-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(template-literal)
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("return" (keyword) (keyword) #f #f #f #f #f #f #f #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f #f #f #f #f #f #f #f #f)
("/ab+c/i"
(regex-literal)
#f
#f
#f
#f
#f
#f
#f
#f
#f
#f
(regex-literal)
#f
#f
#f)
(";" () #f #f #f #f #f #f #f #f #f #f #f #f #f #f))
> (define jsx-derived-tokens (javascript-string->derived-tokens "const el = <Button kind=\"primary\">Hello {name}</Button>;\nconst frag = <>ok</>;" #:jsx? #t))
> (map (lambda (token) (list (javascript-derived-token-text token) (javascript-derived-token-tags token) (javascript-derived-token-has-tag? token 'jsx-tag-name) (javascript-derived-token-has-tag? token 'jsx-closing-tag-name) (javascript-derived-token-has-tag? token 'jsx-attribute-name) (javascript-derived-token-has-tag? token 'jsx-text) (javascript-derived-token-has-tag? token 'jsx-interpolation-boundary) (javascript-derived-token-has-tag? token 'jsx-fragment-boundary))) jsx-derived-tokens)
'(("const" (keyword) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("el" (identifier declaration-name) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("<" () #f #f #f #f #f #f)
("Button" (identifier jsx-tag-name) (jsx-tag-name) #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("kind" (identifier jsx-attribute-name) #f #f (jsx-attribute-name) #f #f #f)
("=" () #f #f #f #f #f #f)
("\"primary\"" (string-literal) #f #f #f #f #f #f)
(">" () #f #f #f #f #f #f)
("Hello " (jsx-text) #f #f #f (jsx-text) #f #f)
("{"
(jsx-interpolation-boundary)
#f
#f
#f
#f
(jsx-interpolation-boundary)
#f)
("name" (identifier) #f #f #f #f #f #f)
("}"
(jsx-interpolation-boundary)
#f
#f
#f
#f
(jsx-interpolation-boundary)
#f)
("</" () #f #f #f #f #f #f)
("Button"
(identifier jsx-closing-tag-name)
#f
(jsx-closing-tag-name)
#f
#f
#f
#f)
(">" () #f #f #f #f #f #f)
(";" () #f #f #f #f #f #f)
("\n" () #f #f #f #f #f #f)
("const" (keyword) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("frag" (identifier declaration-name) #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("=" () #f #f #f #f #f #f)
(" " () #f #f #f #f #f #f)
("<>" (jsx-fragment-boundary) #f #f #f #f #f (jsx-fragment-boundary))
("ok" (jsx-text) #f #f #f (jsx-text) #f #f)
("</>" (jsx-fragment-boundary) #f #f #f #f #f (jsx-fragment-boundary))
(";" () #f #f #f #f #f #f))
value
javascript-profiles : immutable-hash?