On this page:
9.1 Module ~lang Protocol
9.2 Run-Time and Expand-Time Configuration
9.3 #lang Language Protocol
8.17

9 Defining Languages🔗ℹ

Rhombus not only supports macro extensions that add to the rhombus language, it supports entirely new languages that are smaller than rhombus or that have different syntax and semantics. Languages are themselves implemented as Rhombus modules that follow a particular protocol—exporting certain bindings and implementing certain submodules.

The term language in Rhombus is used to refer to two different kinds of languages for two different contexts:

These two kinds of languages are connected, because the result of a #lang-triggered parser is a module form (although at the Racket level), and so it includes a ~lang reference (or the equivalent at the Racket level). Furthermore, modules that implement a language are typically set up so that the language name works in both contexts, and the #lang use of the name generates a reference to the ~lang form of the name. For example, rhombus works both after #lang and in module after ~lang, and both uses of the name refer to the same set of bindings.

9.1 Module ~lang Protocol🔗ℹ

A module that is intended to be used as a language selected by ~lang in module must export various bindings to work:

For example, the following module defines a language that is like rhombus, but it replaces #%module_body to first print out the source of all forms in the module body. After printing, the body forms are evaluated the same way as in rhombus.

"noisy_rhombus.rhm"

#lang  rhombus

import:

  rhombus/meta open

 

export:

  all_from(rhombus):

    except #%module_block

  rename:

    module_block as #%module_block

 

decl.macro 'module_block: $form; ...':

  '#%module_block:

     println($(form.to_source_string()))

     ...

     $form

     ...'

If that module is saved as "noisy_rhombus.rhm", then a module in the same directory can refer to it when declaring a main submodule:

"demo.rhm"

#lang rhombus

 

module main ~lang "noisy_rhombus.rhm":

  1 + 2 // prints "1 + 2" and then "3"

The name "noisy_rhombus.rhm" does not conform to the syntax of languages that can be written after #lang, and the "noisy_rhombus.rhm" module also doesn’t supply a character-level parser. One way to fill that gap, at least in the short term, is to use the shrubbery language, which parses a module body into shrubbery form and then uses the language module that is named immediately after #lang shrubbery:

"demo2.rhm"

#lang shrubbery "noisy_rhombus.rhm"

1 + 2 // prints "1 + 2" and then "3"

9.2 Run-Time and Expand-Time Configuration🔗ℹ

Although bindings can capture most details of a language definition, certain aspects of the compile-time and run-time environment span all languages that are used to construct a program, and so they must be configured in a different way. For example, the way that values should print may differ for a programmer who is working in terms of Rhombus versus one working in terms of Racket, even when printing is initiated by a library that is meant to be used from either language. Racket allows the main module for a program (e.g., the one provided on the command line) to configure run-time behavior, and it allows the language of a module being compiled to configure compile-time behavior. These configurations take the form of submodules:

9.3 #lang Language Protocol🔗ℹ

A language name that follows #lang must have only alphanumeric ASCII, +, -, _, and/or / characters terminated by whitespace or an end-of-file. Thus, a language name cannot be a Rhombus string, but must instead be an unquoted module path that refers to a module in a collection.

Furthermore, the unquoted path is turned into a module path in a way that is different from a language name after ~lang in module or in an import form: a ".rkt" suffix is added instead of a ".rhm" suffix (after "/main" is added in the case that / does not appear in the path). Finally, a reader submodule is found within that module. As a fallback, when a reader submodule is not found, a ".rkt" suffix is replaced with "/lang/reader.rkt" and tried as a module path in place of a reader submodule. This fallback is discouraged for new Rhombus and Racket languages.

The reader submodule protocol, which is defined at the Racket level, requires the submodule to export three functions: #{read}, #{read-syntax}, and #{get-info}. The Rhombus-based language rhombus/reader provides a streamlined interface that is convenient for defining Rhombus-like languages.

The key clause in a rhombus/reader module is ~lang followed by module path for the ~lang-protocol module to use for the parsed module. The module can can be relative to the enclosing reader submodule, so parent serves as a reference to the enclosing module. The following example is the same as "moisy_rhombus.rhm" in "tilde-lang", but with a reader submodule added, and saved as "main.rkt" in a "noisy_rhombus" directory (note the ".rkt" extension instead of ".rhm").

"noisy_rhombus/main.rkt"

#lang  rhombus

import:

  rhombus/meta open

 

module reader ~lang rhombus/reader:

  ~lang parent

 

export:

  all_from(rhombus):

    except #%module_block

  rename:

    module_block as #%module_block

 

decl.macro 'module_block: $form; ...':

  '#%module_block:

     println($(form.to_source_string()))

     ...

     $form

     ...'

Assuming that "noisy_rhombus" has been registered as a collection (possibly by installing it as a package with raco pkg install noisy_rhombus/), then noisy_rhombus works as a language name immediately after #lang:

"demo3.rhm"

#lang noisy_rhombus

1 + 2 // prints "1 + 2" and then "3"

A small problem remains here, created by the mismatch between #lang’s interpretation of module names and the Rhombus import interpretation. The #lang interpretation of noisy_rhombus is lib("noisy_rhombus/main.rkt"), while the import interpretation is lib("noisy_rhombus/main.rhm"). Consequently, these following all_from does not work as would be expected:

"demo4.rhm"

#lang noisy_rhombus

export:

  all_from(noisy_rhombus) // no `lib("noisy_rhombus/main.rhm")`

In fact, the problem is not so much the #lang interpretation of noisy_rhombus as the use of parent in the reader module. Changing to

module reader ~lang rhombus/reader:

  ~lang "main.rhm"

causes as #lang noisy_rhombus module to use lib("noisy_rhombus/main.rhm") as the initially imported module, and we can create "noisy_rhombus/main.rhm" to reexport "noisy_rhombus/main.rkt":

"noisy_rhombus/main.rhm"

#lang rhombus

import:

  "main.rkt"

export:

  all_from(.main)

Those changes allow "demo4.rhm" to work, but a syntax error in "demo4.rhm" would be reported incorrectly, because "noisy_rhombus/main.rhm" has no configure_expand submodule. The rhombus/lang_bridge module helps complete the picture by reexporting and also propagating submodule definitions and exports.

"noisy_rhombus/main.rhm"

#lang rhombus/lang_bridge

~lang: "main.rkt"

Note that "noisy_rhombus/main.rhm" depends on "noisy_rhombus/main.rkt" while "noisy_rhombus/main.rkt" indirectly depends on "noisy_rhombus/main.rkt". This kind of cycle is allowed, because rhombus/reader delays its reference by quoting the ~lang module name.

In short, a best practice for defining #lang languages with Rhombus is