Squicky: a quick wiki parser
Norman Gray <http://nxg.me.uk>
This is a parser for a wiki syntax based closely on WikiCreole, as described below. The source repository is available at bitbucket.
Version 1.2, released 2019 March 23
1 Usage
The dialect parsed here is the consensus WikiCreole syntax of http://www.wikicreole.org/. It handles all of the WikiCreole test cases, except for one test of wiki-internal links (which is in any case somewhat underspecified).
- //italics// 
- **bold** : A line which begins with **, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise. 
- ##monospaced text## : A line which begins with ##, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it]. 
- * bulleted list : (including sublists, the asterisk may or may not be indented) 
- # numbered list : (including sublists) 
- >quoted paragraph : including multiple levels (this appears to be an extension of WikiCreole). 
- [[link to wikipage]] 
- [[URL|description]] 
- {{image.png}} or {{image.png|alt text}} or {{image.png|att=value;att2=value; or more}}. In the last case, the att indicates any attribute on the HTML <img> element, such as class; the att must immediately follow the semicolon (so the last case (which is an extension to the Creole syntax) parses as att2=’value; or more’); and if the att is omitted, it defaults to alt. 
- == heading 
- === subheading 
- ==== subsubheading 
- line\\break 
- ---- (four dashes in a row, on a line by themselves) horizontal list
- ~x escaped character, and ~http://url which isn’t linked 
- {{{in-line literal text}}} 
| {{{ | 
| preformatted text | 
| }}} | 
| |=Heading Col 1 |=Heading Col 2 | | 
| |Cell 1.1 |Two lines\\in Cell 1.2 | | 
| |Cell 2.1 |Cell 2.2 | | 
- ::key value... : adds ‘metadata’, which can be retrieved with the lookup function; For example after ::title Interesting things, the value of the ‘title’ key will be Interesting things. This must be at the beginning of a line. 
- "quoted" : corresponds to <q>quoted</q> (note that’s a double-quote character, not two single quotes). 
- <<element-name content>> : adds <element-name>content</element-name> to the output. 
- The att=value syntax for {{image.png}} is an extension. 
- !!target value ... : processing instruction support – adds a sexp equivalent to <?target value ...?> to the output 
For an example, the following parses some input text, and writes it out as XML.
(require xml squicky (prefix-in srfi19: srfi/19)) (define (write-html-to-port wiki-text port) (write-xml/content (xexpr->xml `(html ((xmlns "http://www.w3.org/1999/xhtml")) (head ,@(cond [(or (lookup-parsed wiki-text 'date) (lookup-parsed wiki-text 'updated)) => (λ (date) `((meta ((name "DC.date") (content ,(srfi19:date->string date "~4"))))))] [else '()]) (title ,(or (lookup wiki-text 'title) "Title"))) (body ,@(body wiki-text)))) port) (newline port)) (write-html-to-port (parse (current-input-port)) (current-output-port)) 
| ::date 2010 December 12 | 
| == Here is a heading | 
| Here is some text, with a list comprising: | 
| * one | 
| * two. | 
| 
 | 
| That's quite //astonishing!//. | 
The parsing is intended to be tolerant. No matter how garbled the WikiCreole syntax, the parser should not produce an error, or a body which fails to satisfy the (listof xexpr?) contract.
2 Command line
| % racket -l squicky -- --html input.wiki >input.html | 
Give the option --help for other instructions.
3 Reference
procedure
(body wikitext) → (listof xml:xexpr?)
wikitext : wikitext/c 
procedure
(set-metadata! wikitext key value) → any
wikitext : wikitext/c key : symbol? value : string? 
procedure
(squicky-version) → string?
(squicky-version with-repo-revision?) → string? with-repo-revision? : boolean? 
3.1 Parsing metadata lookups
The default parsing function for lookup-parsed treats specially only the 'date and 'updated keys, which are returned as SRFI-19 date objects. The date parser is reasonably lenient, and detects all of 2010-09-01, 2010-09-01T12:34:56, 2010 September 1, 1 Sep 2010, Sep 1, 2010, September 1, 2010, 1-09-2010, 1/9/2010 and 1 September 2010 as the same date (that is, nn/nn/nnnn dates are parsed as day-month-year, not month-day-year; ISO-8601-style formats are probably the most reliable in general).
You may override this parsing with a parameter:
4 Release notes
- 1.2, 2019 March 23: Now accepts leading blank lines before wikitext starts. Include release notes in scribble docs. 
- 1.1.2, 2015 September 22: Fix dependencies, addressing dependency warnings on pkgs.racket-lang.org. 
- 1.1.1, 2015 June 9: Minor documentation improvements. 
- 1.1, 2015 January 30: Reworked to be compatible with Racket 6.1.1, and released on the new package system. No significant changes in functionality, apart from the addition of the lookup-value-parser parameter. 
- 1.0, 2011 Janaury 26: First public release to PLaneT