Loading


 [PDF]

HTML: Parsing Library

The html library provides functions to read html documents and structures to represent them.

procedure

(read-xhtml port)  html?

  port : input-port?

procedure

(read-html port)  html?

  port : input-port?
Reads (X)HTML from a port, producing an html instance.

procedure

(read-html-as-xml port)  (listof content/c)

  port : input-port?
Reads HTML from a port, producing an X-expression compatible with the xml library (which defines content/c).

parameter

(read-html-comments)  boolean?

(read-html-comments v)  void?
  v : any/c
If v is not #f, then comments are read and returned. Defaults to #f.

parameter

(use-html-spec)  boolean?

(use-html-spec v)  void?
  v : any/c
If v is not #f, then the HTML must respect the HTML specification with regards to what elements are allowed to be the children of other elements. For example, the top-level "<html>" element may only contain a "<body>" and "<head>" element. Defaults to #t.

1 Example

(module html-example racket
 
  ; Some of the symbols in html and xml conflict with
  ; each other and with racket/base language, so we prefix
  ; to avoid namespace conflict.
  (require (prefix-in h: html)
           (prefix-in x: xml))
 
  (define an-html
    (h:read-xhtml
     (open-input-string
      (string-append
       "<html><head><title>My title</title></head><body>"
       "<p>Hello world</p><p><b>Testing</b>!</p>"
       "</body></html>"))))
 
  ; extract-pcdata: html-content/c -> (listof string)
  ; Pulls out the pcdata strings from some-content.
  (define (extract-pcdata some-content)
    (cond [(x:pcdata? some-content)
           (list (x:pcdata-string some-content))]
          [(x:entity? some-content)
           (list)]
          [else
           (extract-pcdata-from-element some-content)]))
 
  ; extract-pcdata-from-element: html-element -> (listof string)
  ; Pulls out the pcdata strings from an-html-element.
  (define (extract-pcdata-from-element an-html-element)
    (match an-html-element
      [(struct h:html-full (attributes content))
       (apply append (map extract-pcdata content))]
 
      [(struct h:html-element (attributes))
       '()]))
 
  (printf "~s\n" (extract-pcdata an-html)))

 

> (require 'html-example)

("My title" "Hello world" "Testing" "!")

2 HTML Structures

pcdata, entity, and attribute are defined in the xml documentation.

A html-content/c is either

struct

(struct html-element (attributes)
  #:extra-constructor-name make-html-element)
  attributes : (listof attribute)
Any of the structures below inherits from html-element.

struct

(struct html-full struct:html-element (content)
  #:extra-constructor-name make-html-full)
  content : (listof html-content/c)
Any html tag that may include content also inherits from html-full without adding any additional fields.

struct

(struct mzscheme html-full ()
  #:extra-constructor-name make-mzscheme)
A mzscheme is special legacy value for the old documentation system.

struct

(struct html html-full ()
  #:extra-constructor-name make-html)
A html is (make-html (listof attribute) (listof Contents-of-html))

A Contents-of-html is either

struct

(struct div html-full ()
  #:extra-constructor-name make-div)

struct

(struct center html-full ()
  #:extra-constructor-name make-center)

struct

(struct blockquote html-full ()
  #:extra-constructor-name make-blockquote)

struct

(struct ins html-full ()
  #:extra-constructor-name make-ins)
An Ins is (make-ins (listof attribute) (listof G2))

struct

(struct del html-full ()
  #:extra-constructor-name make-del)

struct

(struct dd html-full ()
  #:extra-constructor-name make-dd)

struct

(struct li html-full ()
  #:extra-constructor-name make-li)

struct

(struct th html-full ()
  #:extra-constructor-name make-th)

struct

(struct td html-full ()
  #:extra-constructor-name make-td)

struct

(struct iframe html-full ()
  #:extra-constructor-name make-iframe)

struct

(struct noframes html-full ()
  #:extra-constructor-name make-noframes)

struct

(struct noscript html-full ()
  #:extra-constructor-name make-noscript)

struct

(struct style html-full ()
  #:extra-constructor-name make-style)

struct

(struct script html-full ()
  #:extra-constructor-name make-script)

struct

(struct basefont html-element ()
  #:extra-constructor-name make-basefont)

struct

(struct br html-element ()
  #:extra-constructor-name make-br)

struct

(struct area html-element ()
  #:extra-constructor-name make-area)

struct

(struct alink html-element ()
  #:extra-constructor-name make-alink)

struct

(struct img html-element ()
  #:extra-constructor-name make-img)

struct

(struct param html-element ()
  #:extra-constructor-name make-param)

struct

(struct hr html-element ()
  #:extra-constructor-name make-hr)

struct

(struct input html-element ()
  #:extra-constructor-name make-input)

struct

(struct col html-element ()
  #:extra-constructor-name make-col)

struct

(struct isindex html-element ()
  #:extra-constructor-name make-isindex)

struct

(struct base html-element ()
  #:extra-constructor-name make-base)

struct

(struct meta html-element ()
  #:extra-constructor-name make-meta)

struct

(struct option html-full ()
  #:extra-constructor-name make-option)

struct

(struct textarea html-full ()
  #:extra-constructor-name make-textarea)

struct

(struct title html-full ()
  #:extra-constructor-name make-title)

struct

(struct head html-full ()
  #:extra-constructor-name make-head)
A head is (make-head (listof attribute) (listof Contents-of-head))

A Contents-of-head is either

struct

(struct tr html-full ()
  #:extra-constructor-name make-tr)
A tr is (make-tr (listof attribute) (listof Contents-of-tr))

A Contents-of-tr is either

struct

(struct colgroup html-full ()
  #:extra-constructor-name make-colgroup)

struct

(struct thead html-full ()
  #:extra-constructor-name make-thead)

struct

(struct tfoot html-full ()
  #:extra-constructor-name make-tfoot)

struct

(struct tbody html-full ()
  #:extra-constructor-name make-tbody)

struct

(struct tt html-full ()
  #:extra-constructor-name make-tt)

struct

(struct i html-full ()
  #:extra-constructor-name make-i)
An i is (make-i (listof attribute) (listof G5))

struct

(struct b html-full ()
  #:extra-constructor-name make-b)

struct

(struct u html-full ()
  #:extra-constructor-name make-u)
An u is (make-u (listof attribute) (listof G5))

struct

(struct s html-full ()
  #:extra-constructor-name make-s)

struct

(struct strike html-full ()
  #:extra-constructor-name make-strike)

struct

(struct big html-full ()
  #:extra-constructor-name make-big)

struct

(struct small html-full ()
  #:extra-constructor-name make-small)

struct

(struct em html-full ()
  #:extra-constructor-name make-em)

struct

(struct strong html-full ()
  #:extra-constructor-name make-strong)

struct

(struct dfn html-full ()
  #:extra-constructor-name make-dfn)

struct

(struct code html-full ()
  #:extra-constructor-name make-code)

struct

(struct samp html-full ()
  #:extra-constructor-name make-samp)

struct

(struct kbd html-full ()
  #:extra-constructor-name make-kbd)

struct

(struct var html-full ()
  #:extra-constructor-name make-var)

struct

(struct cite html-full ()
  #:extra-constructor-name make-cite)

struct

(struct abbr html-full ()
  #:extra-constructor-name make-abbr)

struct

(struct acronym html-full ()
  #:extra-constructor-name make-acronym)

struct

(struct sub html-full ()
  #:extra-constructor-name make-sub)

struct

(struct sup html-full ()
  #:extra-constructor-name make-sup)

struct

(struct span html-full ()
  #:extra-constructor-name make-span)

struct

(struct bdo html-full ()
  #:extra-constructor-name make-bdo)

struct

(struct font html-full ()
  #:extra-constructor-name make-font)

struct

(struct p html-full ()
  #:extra-constructor-name make-p)

struct

(struct h1 html-full ()
  #:extra-constructor-name make-h1)

struct

(struct h2 html-full ()
  #:extra-constructor-name make-h2)

struct

(struct h3 html-full ()
  #:extra-constructor-name make-h3)

struct

(struct h4 html-full ()
  #:extra-constructor-name make-h4)

struct

(struct h5 html-full ()
  #:extra-constructor-name make-h5)

struct

(struct h6 html-full ()
  #:extra-constructor-name make-h6)

struct

(struct q html-full ()
  #:extra-constructor-name make-q)

struct

(struct dt html-full ()
  #:extra-constructor-name make-dt)

struct

(struct legend html-full ()
  #:extra-constructor-name make-legend)

struct

(struct caption html-full ()
  #:extra-constructor-name make-caption)

struct

(struct table html-full ()
  #:extra-constructor-name make-table)
A table is (make-table (listof attribute) (listof Contents-of-table))

A Contents-of-table is either

struct

(struct button html-full ()
  #:extra-constructor-name make-button)

struct

(struct fieldset html-full ()
  #:extra-constructor-name make-fieldset)
A fieldset is (make-fieldset (listof attribute) (listof Contents-of-fieldset))

A Contents-of-fieldset is either

struct

(struct optgroup html-full ()
  #:extra-constructor-name make-optgroup)

struct

(struct select html-full ()
  #:extra-constructor-name make-select)
A select is (make-select (listof attribute) (listof Contents-of-select))

A Contents-of-select is either

struct

(struct label html-full ()
  #:extra-constructor-name make-label)

struct

(struct form html-full ()
  #:extra-constructor-name make-form)

struct

(struct ol html-full ()
  #:extra-constructor-name make-ol)

struct

(struct ul html-full ()
  #:extra-constructor-name make-ul)

struct

(struct dir html-full ()
  #:extra-constructor-name make-dir)

struct

(struct menu html-full ()
  #:extra-constructor-name make-menu)

struct

(struct dl html-full ()
  #:extra-constructor-name make-dl)
A dl is (make-dl (listof attribute) (listof Contents-of-dl))

A Contents-of-dl is either

struct

(struct pre html-full ()
  #:extra-constructor-name make-pre)
A pre is (make-pre (listof attribute) (listof Contents-of-pre))

A Contents-of-pre is either
  • G9

  • G11

struct

(struct object html-full ()
  #:extra-constructor-name make-object)
An object is (make-object (listof attribute) (listof Contents-of-object-applet))

struct

(struct applet html-full ()
  #:extra-constructor-name make-applet)
An applet is (make-applet (listof attribute) (listof Contents-of-object-applet))

A Contents-of-object-applet is either

struct

(struct -map html-full ()
  #:extra-constructor-name make--map)
A Map is (make--map (listof attribute) (listof Contents-of-map))

A Contents-of-map is either

struct

(struct a html-full ()
  #:extra-constructor-name make-a)
An a is (make-a (listof attribute) (listof Contents-of-a))

A Contents-of-a is either

struct

(struct address html-full ()
  #:extra-constructor-name make-address)
An address is (make-address (listof attribute) (listof Contents-of-address))

A Contents-of-address is either

struct

(struct body html-full ()
  #:extra-constructor-name make-body)
A body is (make-body (listof attribute) (listof Contents-of-body))

A Contents-of-body is either

A G12 is either

A G11 is either

A G10 is either

A G9 is either

A G8 is either

A G7 is either
  • G8

  • G12

A G6 is either

A G5 is either

A G4 is either
  • G8

  • G10

A G3 is either

A G2 is either