2.3 Parsing via Enforestation and Expansion
Parsing of a module body starts with a syntax object that represents a sequence of shrubbery groups.
Parsing procedes in particular phase level, starting with phase level 0. Bindings from the syntax object’s lexical information drive the parsing process, and they cause new bindings to be introduced for the lexical information of sub-expressions. In some cases, a sub-form is expanded in a phase deeper (having a greater phase-level number) than the enclosing form.
Parsing also happens in a particular space. Parsing of a module always starts in the expression, definition, and declaration spaces for the module body, and then certain forms will trigger parsing in other spaces. For example, parsing a fun expression form will trigger parsing in the binding space for the function’s arguments (if any) and in the annotation space for the function’s result annotation (if declared).
In the parsing description that follows, we take the relevant phase level and space of parsing to be known to a lookup function. We also first consider the parsing of an individual group, but the process is analogous for dealing with a sequence of groups in a module body or other definition context.
2.3.1 Enforest Steps
Parsing begins in enforestation mode. This mode can be described by a enforest function that takes a group syntax object 'term ...' and produces a parsed prefix tree followed by remaining terms. The enforest function also receives an operator opprior whose right-hand side is being parsed; to start, assume a opinit as opprior whose precedence is lower than all other operators.
The syntax objects consumed and produced by enforest can contain parsed objects as terms. If enforest receives a single parsed term as its input, then parsing is done:
enforest('tree', opprior) ⇒ 'tree'
Otherwise, parsing will proceed by calling enforest again on its result.
The simplest possibility for enforest to make progress is that the first term in its input is a name that is bound as a variable. The lookup function that resolves bindings, and let treevar stand for the parsed representation of a variable. Enforestation will continue with the parsed variable at the start of the .
enforest('name term ...', opprior) ⇒ enforest('treevar term ...', opprior)
where lookup(name) = treevar
Instead of a variable, the first term may be a name that is bound as a prefix macro. In that case, an expansion step is performed by calling the macro transformer, then continuing to enforest with the result. The expand function itself may need to recur to arrive at a parsed tree, and we return to expand later. Meanwhile, the precedence of opprior is not relevant when name refers to a prefix operator.
enforest('name term ...', opprior) ⇒ enforest('tree termrest ...', opprior)
where lookup(name) = PrefixMacro(transform)
and expand(transform, 'name term ...') = 'tree termrest ...'
To detect an infix operator, enforestation must have an input seuqence that starts with an already-parsed left-hand side. In that case, the infix-macro binding must provide an operator whose precedence can be compared to opprior, and the infix operator’s transformer is applied only if it has higher precedence.
enforest('treelhs name term ...', opprior) ⇒ enforest('tree termrest ...', opprior)
where lookup(name) = InfixMacro(transform, op)
and op > opprior
and expand(transform, 'treelhs name term ...') = 'tree termrest ...'
If an infix operator is found with lower precedence, then enforest stops instead of calling itself recursively:
enforest('treelhs name term ...', opprior) ⇒ 'treelhs name term ...'
where lookup(name) = InfixMacro(transform, op)
and op ≤ opprior
The two infix possibilities for enforest are the only place where operator precedence needs to be compared. That’s why precedence can be pairwise between two operators, instead of a global order across all operators.
Enforestation’s last job is to introduce implicit operators where needed. If it finds a non-name atomic termlit at the start of the sequence, such as number or string, then it adds #%literal to the start of the stream:
enforest('termlit term ...', opprior) ⇒ enforest('#%literal termlit term ...', opprior)
Implicit operators like #%literal are added only when they are bound, otherwise, enforest would recur forever adding the same implicit name. The lexical information of an added implicit name is taken by enforest from the first term in its input sequence.
Similarly, a parenthesized term triggers #%parens, and so on.
⇒ enforest('#%brackets (group, ...) term ...', opprior)
enforest(': group; ...', opprior)
⇒ enforest('#%block : group; ...', opprior)
Parentheses, brackets, and braces after a parsed term trigger a set of implicit infix names.
⇒ enforest('treelhs #%index (group, ...) term ...', opprior)
For a termother (after a parsed treelhs) that is not parentheses, brackets, or braces, #%juxtapose is added.
enforest('treelhs termother term ...', opprior)
⇒ enforest('treelhs #%juxtapose termother term ...', opprior)
2.3.2 Expand Steps
A macro-expansion transformer takes a syntax object and returns two values: the expansion of the macro, and a sequence of terms that were not consumed by the macro’s application.
In the simple case, when expand applies a macro transformer, it gets an immediate parsed result base. For example, when a macro defined by annot.macro returns a result constructed with annot_meta.pack_predicate or annot_meta.pack_converter, the result is a parsed form. In that case, expansion returns to let enforest proceed (as in Enforest Steps).
where transform('term ...') = values('tree', 'termrest ...')
Otherwise, expand will recur explicitly with enforest to cotinue expansion. The explicit use of enforest does not see the 'termrest ...' tail produces by a tranformer, which means that further expansion cannot produce a semi-parsed form that is accidentally merged with that tail; instead, it’s parsing must complete as separate and intact.
where transform('term ...') = values('term ...', 'termrest ...')
Much of the heavy lifting of expansion is embedded in an individual transform function:
When a syntax class like expr_meta.Parsed, bind_meta.Parsed, or a name bound by parse_syntax_class is used to match syntax, the match applies enforest with lookup adapted to the appropriate space. That process includes implicit uses of parsing syntax classes, such as an a macro bound by expr.macro with an an escaped identifier as the pattern after the defined name. The syntax object associated with the match to a parsing syntax class is a parsed tree that can be returned or incoporated into a larger syntax object.
Nested uses of enforest (via a syntax class) within a transform class explain how parsing is trigered for spaces other than the initial expression space. For example, the fun expression form parses bindings for ther function arguments and an optional annotation for the function result, which implies enforest in each of those spaces.
A transform function generated for a macro defined by macro, expr.macro, annot.macro, and similar starts by allocating a fresh macro scope and use-site scope. Both scopes are added to the inputsynatx object because it is provided to the macro implementation. For the macro’s results, the macro scope is flipped: added where it is not present and remove where it is present. As a consequence, the macro scope identifies components of a syntax object that are introduced by a macro invocation, while the use-site scope identifies components that were originally present in the macro use.
Macro and use-site scopes influence how identifiers are used as bindings and a potential references to other bindings. Macro scopes on an introduced identifier cause it to being only instances of the identifier that are introduced by the same macro invocation, due to the rules on scope sets and binding described in Identifiers, Binding, and Scopes. Use-site scopes, meanwhile, are stripped from an identifier before the identifier is used as binding in the immediate enclosing context of a macro expansion, which has the effect of preventing an identifier that is supplied to a macro from binding introduced identifiers in nested forms. See Transformer Bindings for examples.