Common  Mark:   Standard Markdown
1 Quick start
2 Parsing
string->document
read-document
current-parse-footnotes?
3 Rendering HTML
document->html
write-document-html
document->xexprs
current-italic-tag
current-bold-tag
4 Document structure
document
footnote-definition
4.1 Blocks
block?
paragraph
itemization
blockquote
code-block
html-block
heading
thematic-break
thematic-break?
4.2 Inline content
inline?
italic
bold
code
link
image
footnote-reference
html
line-break
line-break?
5 Extensions
5.1 Footnotes
6 Comparison with markdown
8.12

CommonMark: Standard Markdown🔗ℹ

Alexis King <lexi.lambda@gmail.com>

The source of this manual is available on GitHub.

The commonmark library implements a CommonMark-compliant Markdown parser. Currently, it passes all test cases in v0.30 of the specification. By default, only the Markdown features specified by CommonMark are supported, but non-standard support for footnotes can be optionally enabled; see the Extensions section of this manual for more details.
The commonmark module reprovides all of the bindings provided by commonmark/parse and commonmark/render/html (but not the bindings provided by commonmark/struct).

    1 Quick start

    2 Parsing

    3 Rendering HTML

    4 Document structure

      4.1 Blocks

      4.2 Inline content

    5 Extensions

      5.1 Footnotes

    6 Comparison with markdown

1 Quick start🔗ℹ

For information about the Markdown syntax supported by commonmark, see the CommonMark website.

In commonmark, processing Markdown is split into two steps: parsing and rendering. To get started, use string->document or read-document to parse Markdown input into a document structure:

> (require commonmark)
> (define doc (string->document "*Hello*, **markdown**!"))
> doc

(document

 (list (paragraph (list (italic "Hello") ", " (bold "markdown") "!")))

 '())

A document is an abstract syntax tree that represents Markdown content. If you’d like, you can choose to render it however you wish, but most uses of Markdown render it to HTML, so commonmark provides the document->html and write-document-html functions, which render a document to HTML in the way recommended by the CommonMark specification:

> (write-document-html doc)

<p><em>Hello</em>, <strong>markdown</strong>!</p>

The document->xexprs function can also be used to render a document to a list of X-expressions, which can make it more convenient to incorporate rendered Markdown into a larger HTML document (though do be aware of the caveats involving HTML blocks and HTML spans described in the documentation for document->xexprs):

> (document->xexprs doc)

'((p (em "Hello") ", " (strong "markdown") "!"))

2 Parsing🔗ℹ

The commonmark/parse module provides functions for parsing Markdown content into a document structure. To render Markdown to HTML, use this module in combination with the functions provided by commonmark/render/html.
All of the bindings provided by commonmark/parse are also provided by commonmark.

procedure

(string->document str)  document?

  str : string?
Parses str as a Markdown document.

Example:
> (define doc (string->document "*Hello*, **markdown**!"))
> doc

(document

 (list (paragraph (list (italic "Hello") ", " (bold "markdown") "!")))

 '())

> (write-document-html doc)

<p><em>Hello</em>, <strong>markdown</strong>!</p>

This function cannot fail: every string of Unicode characters can—somehow—be interpreted as Markdown. Of course, the interpretation may be somewhat tortured if applied to input for which such interpretation was not intended.

procedure

(read-document in)  document?

  in : input-port?
Like string->document, but the input is read from the given input port rather than from a string.

Example:
> (define doc (read-document (open-input-string "*Hello*, **markdown**!")))
> doc

(document

 (list (paragraph (list (italic "Hello") ", " (bold "markdown") "!")))

 '())

> (write-document-html doc)

<p><em>Hello</em>, <strong>markdown</strong>!</p>

This function can sometimes be more efficient than (read-document (port->string in)), but probably not significantly so, as the entire document structure must be realized in memory regardless.

parameter

(current-parse-footnotes?)  boolean?

(current-parse-footnotes? parse-footnotes?)  void?
  parse-footnotes? : any/c
 = #f
Enables or disables footnote parsing, which is an extension to the CommonMark specification; see Footnotes for more details.

Note that the value of current-parse-footnotes? only affects parsing, not rendering. If a document containing footnotes is rendered to HTML, the footnotes will still be rendered even if (current-parse-footnotes?) is #f.

Added in version 1.1 of package commonmark-lib.

3 Rendering HTML🔗ℹ

The commonmark/render/html module provides functions for rendering a parsed Markdown document to HTML, as recommended by the CommonMark specification. This module should generally be used in combination with commonmark/parse, which provides functions for producing a document structure from Markdown input.
All of the bindings provided by commonmark/render/html are also provided by commonmark.

procedure

(document->html doc)  string?

  doc : document?
Renders doc to HTML in the format recommended by the CommonMark specification.

Example:
> (document->html (string->document "*Hello*, **markdown**!"))

"<p><em>Hello</em>, <strong>markdown</strong>!</p>"

procedure

(write-document-html doc [out])  void?

  doc : document?
  out : output-port? = (current-output-port)
Like document->html, but writes the rendered HTML directly to out rather than returning it as a string.

Example:
> (write-document-html (string->document "*Hello*, **markdown**!"))

<p><em>Hello</em>, <strong>markdown</strong>!</p>

procedure

(document->xexprs doc)  (listof xexpr/c)

  doc : document?
Like document->html, but returns the rendered HTML as a list of X-expressions rather than as a string.

Example:
> (document->xexprs (string->document "*Hello*, **markdown**!"))

'((p (em "Hello") ", " (strong "markdown") "!"))

Note that HTML blocks and HTML spans are not parsed and may not even contain valid HTML, which makes them difficult to represent as an X-expression. As a workaround, raw HTML will be represented as cdata elements:

> (document->xexprs
   (string->document "A paragraph with <marquee>raw HTML</marquee>."))

(list

 (list

  'p

  "A paragraph with "

  (cdata #f #f "<marquee>")

  "raw HTML"

  (cdata #f #f "</marquee>")

  "."))

This generally works out okay, since cdata elements render directly as their unescaped content, but it is, strictly speaking, an abuse of cdata.

parameter

(current-italic-tag)  symbol?

(current-italic-tag tag)  void?
  tag : symbol?
 = 'em

parameter

(current-bold-tag)  symbol?

(current-bold-tag tag)  void?
  tag : symbol?
 = 'strong
These parameters determine which HTML tag is used to render italic spans and bold spans, respectively. The default values of 'em and 'strong correspond to those required by the CommonMark specification, but this can be semantically incorrect if “emphasis” syntax is used for purposes other than emphasis, such as italicizing the title of a book.

Reasonable alternate values for current-italic-tag and current-bold-tag include 'i, 'b, 'mark, 'cite, or 'defn, all of which are elements with semantic (rather than presentational) meaning in HTML5. Of course, the “most correct” choice depends on how italic spans and bold spans will actually be used, so no one set of choices can be universally called the best.

Example:
> (parameterize ([current-italic-tag 'cite]
                 [current-bold-tag 'mark])
    (document->xexprs
     (string->document
      (string-append
       "> First, programming is about stating and solving problems,\n"
       "> and this activity normally takes place in a context with its\n"
       "> own language of discourse; **good programmers ought to\n"
       "> formulate this language as a programming language**.\n"
       "\n"
       "— *The Racket Manifesto* (emphasis mine)"))))

'((blockquote

   (p

    "First, programming is about stating and solving problems,\n"

    "and this activity normally takes place in a context with its\n"

    "own language of discourse; "

    (mark

     "good programmers ought to\n"

     "formulate this language as a programming language")

    "."))

  (p "— " (cite "The Racket Manifesto") " (emphasis mine)"))

4 Document structure🔗ℹ

The commonmark/struct module provides structure types used to represent the abstract syntax of Markdown content. The root of the syntax tree hierarchy is a document, which contains blocks, which in turn contain inline content. Most users will not need to interact with these structures directly, but doing so can be useful to perform additional processing on the document before rendering it, or to render Markdown to a format other than HTML.
Note that the bindings in this section are only provided by commonmark/struct, not commonmark.

struct

(struct document (blocks footnotes)
    #:transparent)
  blocks : (listof block?)
  footnotes : (listof footnote-definition?)
A parsed Markdown document, which has a body flow and a list of footnote definitions. It can be parsed from Markdown input using read-document or string->document and can be rendered to HTML using document->html.

Changed in version 1.1 of package commonmark-lib: Added the footnotes field.

struct

(struct footnote-definition (blocks label)
    #:transparent)
  blocks : (listof block?)
  label : string?

Footnotes are an extension to the CommonMark specification and are not enabled by default; see Footnotes in the Extensions section of this manual for more details.

A footnote definition contains a flow that can be referenced by a footnote reference via its footnote label.

Note: although footnote definitions are syntactically blocks in Markdown input, they are not a type of block (as recognized by the block? predicate) and cannot be included directly in the main document flow. Footnote definitions are collected into the separate document-footnotes field of the document structure during parsing, since they represent auxiliary definitions, and their precise location in the Markdown input does not matter.

(This is quite similar to the way the parser processes link reference definitions, except that footnote definitions must be retained separately for later rendering, whereas link reference definitions can be discarded after all link targets have been resolved.)

Added in version 1.1 of package commonmark-lib.

4.1 Blocks🔗ℹ

procedure

(block? v)  boolean?

  v : any/c

See § Blocks and inlines in the CommonMark specification for more information about blocks.

Returns #t if v is a block: a paragraph, itemization, block quote, code block, HTML block, heading, or thematic break. Otherwise, returns #f.

A flow is a list of blocks. The body of a document, the contents of a block quote, and each item in an itemization are flows.

struct

(struct paragraph (content)
    #:transparent)
  content : inline?

See § Paragraphs in the CommonMark specification for more information about paragraphs.

A paragraph is a block that contains inline content. In HTML output, it corresponds to a <p> element. Most blocks in a document are usually paragraphs.

struct

(struct itemization (blockss style start-num)
    #:transparent)
  blockss : (listof (listof block?))
  style : (or/c 'loose 'tight)
  start-num : (or/c exact-nonnegative-integer? #f)

See § Lists and § List items in the CommonMark specification for more information about itemizations.

An itemization is a block that contains a list of flows. In HTML output, it corresponds to a <ul> or <ol> element.

The style field records whether the itemization is loose or tight: if style is 'tight, paragraphs in HTML output are not wrapped in <p> tags.

If start-num is #f, then the itemization represents a bullet list. Otherwise, the itemization represents an ordered list, and the value of start-num is its start number.

struct

(struct blockquote (blocks)
    #:transparent)
  blocks : (listof block?)

See § Block quotes in the CommonMark specification for more information about block quotes.

A block quote is a block that contains a nested flow. In HTML output, it corresponds to a <blockquote> element.

struct

(struct code-block (content info-string)
    #:transparent)
  content : string?
  info-string : (or/c string? #f)

See § Indented code blocks and § Fenced code blocks in the CommonMark specification for more information about code blocks.

A code block is a block that has unformatted content and an optional info string. In HTML output, it corresponds to a <pre> element that contains a <code> element.

The CommonMark specification does not mandate any particular treatment of the info string, but it notes that “the first word is typically used to specify the language of the code block.” In HTML output, the language is indicated by adding a CSS class to the rendered <code> element consisting of language- followed by the language name, per the spec’s recommendation.

struct

(struct html-block (content)
    #:transparent)
  content : string?

See § HTML Blocks in the CommonMark specification for more information about HTML blocks.

An HTML block is a block that contains raw HTML content (and will be left unescaped in HTML output). Note that, in general, the content may not actually be well-formed HTML, as CommonMark simply treats everything that “looks sufficiently like” HTML—according to some heuristics—as raw HTML.

struct

(struct heading (content depth)
    #:transparent)
  content : inline?
  depth : (integer-in 1 6)

See § ATX headings and § Setext headings in the CommonMark specification for more information about headings.

A heading has inline content and a heading depth. In HTML output, it corresponds to one of the <h1> through <h6> elements.

A heading depth is an integer between 1 and 6, inclusive, where higher numbers correspond to more-nested headings.

value

thematic-break : thematic-break?

procedure

(thematic-break? v)  boolean?

  v : any/c

See § Thematic breaks in the CommonMark specification for more information about thematic breaks.

A thematic break is a block. It is usually rendered as a horizontal rule, and in HTML output, it corresponds to an <hr> element.

4.2 Inline content🔗ℹ

procedure

(inline? v)  boolean?

  v : any/c

See § Blocks and inlines in the CommonMark specification for more information about inline content.

Returns #t if v is inline content: a string, italic span, bold span, code span, link, image, footnote reference, HTML span, hard line break, or list of inline content. Otherwise, returns #f.

struct

(struct italic (content)
    #:transparent)
  content : inline?

See § Emphasis and strong emphasis in the CommonMark specification for more information about italic spans.

An italic span is inline content that contains nested inline content. By default, in HTML output, it corresponds to an <em> element (but an alternate tag can be used by modifying current-italic-tag).

struct

(struct bold (content)
    #:transparent)
  content : inline?

See § Emphasis and strong emphasis in the CommonMark specification for more information about bold spans.

A bold span is inline content that contains nested inline content. By default, in HTML output, it corresponds to a <strong> element (but an alternate tag can be used by modifying current-bold-tag).

struct

(struct code (content)
    #:transparent)
  content : string?

See § Code spans in the CommonMark specification for more information about code spans.

A code span is inline content that contains unformatted content. In HTML output, it corresponds to a <code> element.

struct

(struct link (content dest title)
    #:transparent)
  content : inline?
  dest : string?
  title : (or/c string? #f)

See § Links in the CommonMark specification for more information about links.

A link is inline content that contains nested inline content, a link destination, and an optional link title. In HTML output, it corresponds to an <a> element.

struct

(struct image (description source title)
    #:transparent)
  description : inline?
  source : string?
  title : (or/c string? #f)

See § Images in the CommonMark specification for more information about images.

An image is inline content with a source path or URL that should point to an image. It has an inline content description (which is used as the alt attribute in HTML output) and an optional title. In HTML output, it corresponds to an <img> element.

struct

(struct footnote-reference (label)
    #:transparent)
  label : string?

Footnotes are an extension to the CommonMark specification and are not enabled by default; see Footnotes in the Extensions section of this manual for more details.

A footnote reference is inline content that references a footnote definition with a matching footnote label. In HTML output, it corresponds to a superscript <a> element.

Added in version 1.1 of package commonmark-lib.

struct

(struct html (content)
    #:transparent)
  content : string?

See § Raw HTML in the CommonMark specification for more information about HTML spans.

An HTML span is inline content that contains raw HTML content (and will be left unescaped in HTML output). Note that, in general, the content may not actually be well-formed HTML, as CommonMark simply treats everything that “looks sufficiently like” HTML—according to some heuristics—as raw HTML.

value

line-break : line-break?

procedure

(line-break? v)  boolean?

  v : any/c

See § Hard line breaks in the CommonMark specification for more information about hard line breaks.

A hard line break is inline content used for separating inline content within a block. In HTML output, it corresponds to a <br> element.

5 Extensions🔗ℹ

By default, commonmark adheres precisely to the CommonMark specification, which is the subset of Markdown that behaves consistently across implementations. However, many Markdown libraries implement extensions beyond what is specified, several of which are useful enough to have become de facto standards across major Markdown implementations.

Unfortunately, since such features are not precisely specified, implementations of Markdown extensions rarely agree on how exactly they ought to be parsed and rendered, especially when interactions with other Markdown features leave edge cases and ambiguities. commonmark therefore deviates from the standard only if explicitly instructed to do so, and hopefully programmers who choose to venture into such uncharted waters understand they bear some responsibility for what they are getting themselves into.

This section documents all of the extensions commonmark currently supports. Note that, due to their inherently ill-specified nature, it can sometimes be difficult to determine whether a divergence in behavior between two Markdown implementations constitutes a bug or two incompatible features. For that reason, backwards compatibility of extensions’ behavior may not be perfectly maintained wherever the interpretation is not sufficiently “obvious”. Consider yourself warned.

5.1 Footnotes🔗ℹ

Footnotes enjoy support from a wide variety of Markdown implementations, including PHP Markdown Extra, Python-Markdown, Pandoc, GitHub Flavored Markdown, and markdown. The [^label] syntax for references and definitions is nearly universal, but minor differences exist in interpretation, and rendering varies significantly. commonmark’s implementation is not precisely identical to any of them, but it was originally based on the cmark-gfm implementation of GitHub Flavored Markdown.

Footnotes allow auxiliary information to be lifted out of the main document flow to avoid cluttering the body text. When footnote parsing is enabled via the current-parse-footnotes? parameter, shortcut reference links with a link label that begins with a ^ character are instead parsed as footnote references. For example, the following paragraph includes three footnote references:

Racket is a programming language[^1] descended from Scheme.[^scheme]

Although not all Racket programs retain Lisp syntax, most Racket

programs still include a great many parentheses.[^(()())]

Text between the [^ and ] characters constitutes the footnote label, and the content of the footnote is provided via a footnote definition with a matching footnote label. Footnote definitions have similar syntax to link reference definitions, but unlike link reference definitions the body of a footnote definition is an arbitrary flow. For example, the following syntax defines two footnotes matched by the footnote references above:

[^1]: Technically, the name *Racket* refers to both the runtime

    environment and the primary language used to program it.

 

[^scheme]: The original name for the Racket project was PLT Scheme,

    but it was renamed in 2010 [to avoid confusion and to reflect its

    departure from its roots](https://racket-lang.org/new-name.html).

Syntactically, footnote definitions are a type of container block and may appear within any flow, though they are not semantically children of any flow in which they appear. Their placement does not affect their interpretation—a footnote reference may reference any footnote defined in the same document—unless two definitions have matching footnote labels, in which case the later definition is ignored.

As mentioned above, a footnote definition may contain an arbitrary flow consisting of any number of blocks. All lines after the first must be indented by 4 spaces to be included in the definition (unless they are lazy continuation lines). For example, the following footnote definition includes a block quote, an indented code block, and a paragraph:

[^long note]:

    > This is a block quote that is nested inside

    > the body of a footnote.

 

        This is an indented code block

        inside of a footnote.

 

    This paragraph is also inside the footnote.

A footnote reference must match a footnote definition somewhere in the document to be parsed as a footnote reference. If no such definition exists, the label will be parsed as literal text. Each footnote definition can be referenced an arbitrary number of times.

When footnotes are parsed, each footnote reference is represented in-place by an instance of footnote-reference, but footnote definitions are removed from the main document flow and collected into a list of footnote-definition instances in a separate document-footnotes field. This allows renderers to more easily match references to their corresponding definitions and ensures that the placement of definitions within a document cannot affect the rendered output.

When given a document containing footnotes, the default HTML renderer mimicks the output produced by cmark-gfm. Specifically, the renderer appends a <section class="footnotes"> element to the end of the output, which wraps an <ol> element containing the footnotes’ content:

markdown

Here is a paragraph[^1] with

two footnote references.[^2]

 

[^1]: Here is the first footnote.

[^2]: And here is the second.

rendered

Here is a paragraph1 with two footnote references.2

  1. Here is the first footnote.

  2. And here is the second.

Each rendered footnote definition includes a backreference link, denoted by a character, that links to the corresponding footnote reference in the body text. If a definition is referenced multiple times, the rendered footnote will include multiple backreference links:

markdown

Here is a paragraph[^1] that

references a footnote twice.[^1]

 

[^1]: Here is the footnote.

rendered

Here is a paragraph1 that references a footnote twice.1

  1. Here is the footnote. 1 2

In both of the previous examples, the chosen footnote labels happen to line up with the rendered footnote numbers, but in general, that does not need to be the case. Footnote references are always rendered numerically, in the order they appear in the document, regardless of the footnote labels used in the document’s source:

Although footnotes are visually renumbered by the renderer, the generated links and link anchors are based on the original footnote labels. This means that a link to particular footnote definition will remain stable even if a document is modified as long as its label remains unchanged.

markdown

Here are some footnotes[^a]

with non-numeric[^b] names.

 

And here are some footnotes[^2]

numbered out of order.[^3]

 

[^a]: Here is footnote a.

[^b]: Here is footnote b.

[^2]: Here is footnote 2.

[^3]: Here is footnote 3.

rendered

Here are some footnotes1 with non-numeric2 names.

And here are some footnotes3 numbered out of order.4

  1. Here is footnote a.

  2. Here is footnote b.

  3. Here is footnote 2.

  4. Here is footnote 3.

In a similar vein, the order in which footnote definitions appear does not matter, as they will be rendered in the order they are first referenced in the document. If a definition is never referenced, it will not be rendered at all:

markdown

Here is a paragraph[^1] with

two footnote references.[^3]

 

[^3]: Here is footnote 3.

[^2]: Here is footnote 2.

[^1]: Here is footnote 1.

rendered

Here is a paragraph1 with two footnote references.2

  1. Here is footnote 1.

  2. Here is footnote 3.

Footnote references may appear inside footnote definitions, and commonmark will not object (though your readers might). Footnotes that are first referenced in a footnote definition will be numbered so that they immediately follow the referencing footnote:

markdown

Here is a paragraph[^1] with

two footnote references.[^2]

 

[^1]: Here is footnote 1.[^3]

[^2]: Here is footnote 2.

[^3]: Here is footnote 3.

rendered

Here is a paragraph1 with two footnote references.3

  1. Here is footnote 1.2

  2. Here is footnote 3.

  3. Here is footnote 2.

Note that while matching footnote references to their corresponding definitions is handled by the parser, pruning and renumbering of footnote definitions is handled entirely by the renderer, which allows alternate renderers to use alternate schemes if they so desire.

6 Comparison with markdown🔗ℹ

The commonmark library is not the first Markdown parser implemented in Racket: it is long predated by the venerable markdown library, which in fact also predates the CommonMark specification itself. The libraries naturally provide similar functionality, but there are some key differences:

Takeaway: if you need the extra features provided by markdown, use markdown, otherwise use commonmark.