7.9

## Squicky: a quick wiki parser

Norman Gray <http://nxg.me.uk>

This is a parser for a wiki syntax based closely on WikiCreole, as described below. The source repository is available at bitbucket.

Version 1.2, released 2019 March 23

### 1Usage

The dialect parsed here is the consensus WikiCreole syntax of http://www.wikicreole.org/. It handles all of the WikiCreole test cases, except for one test of wiki-internal links (which is in any case somewhat underspecified).

In particular, the supported syntax is
• //italics//

• **bold** : A line which begins with **, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise.

• ##monospaced text## : A line which begins with ##, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it].

•  * bulleted list : (including sublists, the asterisk may or may not be indented)

•  # numbered list : (including sublists)

• >quoted paragraph : including multiple levels (this appears to be an extension of WikiCreole).

• [[URL|description]]

• {{image.png}} or {{image.png|alt text}} or {{image.png|att=value;att2=value; or more}}. In the last case, the att indicates any attribute on the HTML <img> element, such as class; the att must immediately follow the semicolon (so the last case (which is an extension to the Creole syntax) parses as att2=value; or more); and if the att is omitted, it defaults to alt.

• line\\break

•  ----
(four dashes in a row, on a line by themselves) horizontal list

• ~x escaped character, and ~http://url which isn’t linked

•  {{{in-line literal text}}}

Blocks of verbatim text (which will typically be rendered to <pre> blocks), can be specified with:
 {{{ preformatted text }}}
The opening {{{, and its closing partner, must be on lines by themselves. The newline after the opening marker, and the newline before the closing one, are ignored.

Tables look like this:
 |=Heading Col 1 |=Heading Col 2         | |Cell 1.1       |Two lines\\in Cell 1.2 | |Cell 2.1       |Cell 2.2               |

• ::key value... : adds ‘metadata’, which can be retrieved with the lookup function; For example after ::title Interesting things, the value of the ‘title’ key will be Interesting things. This must be at the beginning of a line.

• "quoted" : corresponds to <q>quoted</q> (note that’s a double-quote character, not two single quotes).

• <<element-name content>> : adds <element-name>content</element-name> to the output.

• The att=value syntax for {{image.png}} is an extension.

• !!target value ... : processing instruction support – adds a sexp equivalent to <?target value ...?> to the output

For an example, the following parses some input text, and writes it out as XML.

 (require xml squicky (prefix-in srfi19: srfi/19)) (define (write-html-to-port wiki-text port) (write-xml/content (xexpr->xml (html ((xmlns "http://www.w3.org/1999/xhtml")) (head ,@(cond [(or (lookup-parsed wiki-text 'date) (lookup-parsed wiki-text 'updated)) => (λ (date) ((meta ((name "DC.date") (content ,(srfi19:date->string date "~4"))))))] [else '()]) (title ,(or (lookup wiki-text 'title) "Title"))) (body ,@(body wiki-text)))) port) (newline port)) (write-html-to-port (parse (current-input-port)) (current-output-port))

Suitable input text would be:
 ::date 2010 December 12 == Here is a heading Here is some text, with a list comprising: * one * two. That's quite //astonishing!//.

The parsing is intended to be tolerant. No matter how garbled the WikiCreole syntax, the parser should not produce an error, or a body which fails to satisfy the (listof xexpr?) contract.

### 2Command line

To convert input text to output, use
 % racket -l squicky -- --html input.wiki >input.html

Give the option --help for other instructions.

### 3Reference

 procedure(parse source) → wikitext/c source : (or/c port? string?)
Parse the source into a wikitext object.
 procedure(wikitext? x) → boolean? x : any/c
Returns #t if x is a parsed wikitext object.
 procedure(body wikitext) → (listof xml:xexpr?) wikitext : wikitext/c
Extract the body of the document from the wikitext object. Each ‘block’ structure – such as a paragraph or a header – produces a separate XML xexpr?. This sequence of xexpr? will have to be wrapped inside a further list/element before it can, for example, be processed by the XML module’s xexpr->xml function (so (cons 'doc (body wikitext)) creates an xexpr representing a doc element containing the parsed content).
 procedure(lookup wikitext key) → (or/c string? false/c) wikitext : wikitext/c key : symbol?
Retrieve the metadata value corresponding to key key, or #f if the key was not specified.
 procedure(lookup/multiple wikitext key) → (listof string?) wikitext : wikitext/c key : symbol?
Retrieve the multiple metadata values corresponding to key, or an empty list if there was none. Thus if a metadata value appears several times in the input file, then all of the values appear here, in order.
 procedure(lookup-parsed wikitext key) → any wikitext : wikitext/c key : symbol?
Like lookup, except that, depending on the key, the value is returned as a parsed object. See also lookup-value-parser.
 procedure(lookup-keys wikitext) → (listof symbol?) wikitext : wikitext/c
Return the list of available keys.
 procedure(set-metadata! wikitext key value) → any wikitext : wikitext/c key : symbol? value : string?
Set a metadata key to have the given value. This changes the value retrieved by lookup; but lookup/multiple returns this and any previous values. There is not currently any way of fully replacing a value.
 procedure(squicky-version) → string? (squicky-version with-repo-revision?) → string? with-repo-revision? : boolean?
Returns a string giving the version of the squicky parser. If with-repo-revision? is true, then the output includes the identifier of the repository revision this represents.

The default parsing function for lookup-parsed treats specially only the 'date and 'updated keys, which are returned as SRFI-19 date objects. The date parser is reasonably lenient, and detects all of 2010-09-01, 2010-09-01T12:34:56, 2010 September 1, 1 Sep 2010, Sep 1, 2010, September 1, 2010, 1-09-2010, 1/9/2010 and 1 September 2010 as the same date (that is, nn/nn/nnnn dates are parsed as day-month-year, not month-day-year; ISO-8601-style formats are probably the most reliable in general).

You may override this parsing with a parameter:

 parameter(lookup-value-parser) → (-> symbol? string? any/c) (lookup-value-parser parser) → void? parser : (-> symbol? string? any/c)
A parameter which evaluates to a parsing function for lookup-parsed. The function is given a key and a non-#f string, and may return anything, including #f. If, for a given key, lookup would return the value #f, then lookup-parsed returns #f. Otherwise, lookup-parsed returns the value of ((lookup-value-parser) key value). Thus, if the function does not recognise the key, it should return the value unchanged.

### 4Release notes

• 1.2, 2019 March 23: Now accepts leading blank lines before wikitext starts. Include release notes in scribble docs.

• 1.1.2, 2015 September 22: Fix dependencies, addressing dependency warnings on pkgs.racket-lang.org.

• 1.1.1, 2015 June 9: Minor documentation improvements.

• 1.1, 2015 January 30: Reworked to be compatible with Racket 6.1.1, and released on the new package system. No significant changes in functionality, apart from the addition of the lookup-value-parser parameter.

• 1.0, 2011 Janaury 26: First public release to PLaneT