BCP-47 compliant language tag predicates
This module provides a single predicate that determines whether a given string is a valid Language Tag as defined by RFC5646 and used across HTTP, HTML, XML, RDF, and much more.
References
- BCP-47, RFC5646 Tags for Identifying Languages 
- IANA Registry of Language Tags (Assigned) 
- IANA Registry of Language Subtags 
- IANA Registry of Language Tag Extensions (UCD) 
predicate
(language-tag? val) → boolean?
val : (or/c symbol? string?) 
> (require langtag) > (language-tag? "en") #t
> (language-tag? "en-US") #t
> (language-tag? "en-US-boont") #t
> (language-tag? "en-Latn-US") #t
> (language-tag? "i-klingon") #t
> (language-tag? "x-private") #t
1 Components
predicate
(normal-use? val) → boolean?
val : (or/c symbol? string?) 
predicate
(private-use? val) → boolean?
val : (or/c symbol? string?) 
predicate
(grandfathered? val) → boolean?
val : (or/c symbol? string?) 
> (require langtag) 
> (for-each (lambda (val) (displayln (format "~s ~s ~s" (normal-use? val) (private-use? val) (grandfathered? val)))) '("en-US" "x-private" "i-klingon")) 
#t #f #f
#f #t #f
#f #f #t
predicate
(language-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-script-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-region-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-variant-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-extension-part? val) → boolean?
val : (or/c symbol? string?) 
predicate
(language-private-use-part? val) → boolean?
val : (or/c symbol? string?) 
2 Matching
procedure
(language-tag-match val)
→ (list symbol? string? (or/c (listof (cons/c symbol? string?)) none/c)) val : (or/c symbol? string?) 
> (require langtag) > (language-tag-match "en") '(lang "en" ((language . "en")))
> (language-tag-match "en-US") '(lang "en-US" ((language . "en") (region . "US")))
> (language-tag-match "en-US-boont") '(lang "en-US-boont" ((language . "en") (region . "US") (variant . "boont")))
> (language-tag-match "en-Latn-US") '(lang "en-Latn-US" ((language . "en") (script . "Latn") (region . "US")))
> (language-tag-match "i-klingon") '(grandfathered-i "i-klingon")
> (language-tag-match "x-private") '(private-use "x-private")
3 Appendix: Definition
The syntax of the language tag, from [RFC5646], in ABNF [RFC5234] is:
| Language-Tag = langtag ; normal language tags | 
| / privateuse ; private use tag | 
| / grandfathered ; grandfathered tags | 
| 
 | 
| langtag = language | 
| ["-" script] | 
| ["-" region] | 
| *("-" variant) | 
| *("-" extension) | 
| ["-" privateuse] | 
| 
 | 
| language = 2*3ALPHA ; shortest ISO 639 code | 
| ["-" extlang] ; sometimes followed by | 
| ; extended language subtags | 
| / 4ALPHA ; or reserved for future use | 
| / 5*8ALPHA ; or registered language subtag | 
| 
 | 
| extlang = 3ALPHA ; selected ISO 639 codes | 
| *2("-" 3ALPHA) ; permanently reserved | 
| 
 | 
| script = 4ALPHA ; ISO 15924 code | 
| 
 | 
| region = 2ALPHA ; ISO 3166-1 code | 
| / 3DIGIT ; UN M.49 code | 
| 
 | 
| variant = 5*8alphanum ; registered variants | 
| / (DIGIT 3alphanum) | 
| 
 | 
| extension = singleton 1*("-" (2*8alphanum)) | 
| 
 | 
| ; Single alphanumerics | 
| ; "x" reserved for private use | 
| singleton = DIGIT ; 0 - 9 | 
| / %x41-57 ; A - W | 
| / %x59-5A ; Y - Z | 
| / %x61-77 ; a - w | 
| / %x79-7A ; y - z | 
| 
 | 
| privateuse = "x" 1*("-" (1*8alphanum)) | 
| 
 | 
| grandfathered = irregular ; non-redundant tags registered | 
| / regular ; during the RFC 3066 era | 
| 
 | 
| irregular = "en-GB-oed" ; irregular tags do not match | 
| / "i-ami" ; the 'langtag' production and | 
| / "i-bnn" ; would not otherwise be | 
| / "i-default" ; considered 'well-formed' | 
| / "i-enochian" ; These tags are all valid, | 
| / "i-hak" ; but most are deprecated | 
| / "i-klingon" ; in favor of more modern | 
| / "i-lux" ; subtags or subtag | 
| / "i-mingo" ; combination | 
| / "i-navajo" | 
| / "i-pwn" | 
| / "i-tao" | 
| / "i-tay" | 
| / "i-tsu" | 
| / "sgn-BE-FR" | 
| / "sgn-BE-NL" | 
| / "sgn-CH-DE" | 
| 
 | 
| regular = "art-lojban" ; these tags match the 'langtag' | 
| / "cel-gaulish" ; production, but their subtags | 
| / "no-bok" ; are not extended language | 
| / "no-nyn" ; or variant subtags: their meaning | 
| / "zh-guoyu" ; is defined by their registration | 
| / "zh-hakka" ; and all of these are deprecated | 
| / "zh-min" ; in favor of a more modern | 
| / "zh-min-nan" ; subtag or sequence of subtags | 
| / "zh-xiang" | 
| 
 | 
| alphanum = (ALPHA / DIGIT) ; letters and numbers |