2 UCD Properties
The Unicode standard defines a large number of character properties which describe the type and behavior of characters and character compositions. The data supporting these properties is collected into the Unicode Character Database (UCD) and made available as either a set of text files or XML files.
The functions below rely on a set of racket source files, expressed either as hashes or rang dicts, that are loaded lazily to fetch specific property values. The loading of the source data incurs a runtime penalty for the first call that requires that specific data but once loaded this penalty is avoided in future calls.
These source files are generated by tooling from the latest UCD files as described in the separate section Data Generator.
2.1 Module codepoint/properties
(require codepoint/properties) | package: codepoint |
The functions below are either directly mapped to to a character property, or are derived from a character property.
> (define cp (char->codepoint #\§)) > (ucd-latin-1? cp) #t
> (ucd-name cp) "SECTION SIGN"
> (ucd-name-aliases cp) ucd-name-aliases: no property data found for codepoint
codepoint: 167
property: 'name-aliases
> (ucd-general-category cp) 'Po
> (cdr (assoc (ucd-general-category cp) *general-categories*)) "Other punctuation"
> (ucd-age cp) "1.1"
> (ucd-block-name cp) "Latin-1 Supplement"
> (ucd-scripts cp) '(Common)
> (ucd-script-extensions cp (lambda () "None found!")) "None found!"
> (ucd-line-break cp) 'AI
> (cdr (assoc (ucd-line-break cp) *line-breaks*)) "Ambiguous (Alphabetic or Ideographic)"
For any function below that performs a property lookup and has a parameter named failure-result, if no value is found for codepoint, then failure-result determines the result:
If failure-result is a procedure, it is called (through a tail call) with no arguments to produce the result.
Otherwise, failure-result is returned as the result.
> (format "Generated from UCD data, version ~a" *corresponding-unicode-version*) "Generated from UCD data, version 14.0.0"
procedure
(ucd-ascii? c) → boolean?
c : codepoint?
procedure
(ucd-latin-1? c) → boolean?
c : codepoint?
procedure
c : codepoint?
procedure
(ucd-name->symbol c) → symbol?
c : codepoint?
> (ucd-name->symbol 73) 'latin-capital-letter-i
> (ucd-name->symbol 0) 'control/0
> (ucd-name->symbol 13312) 'cjk-ideograph-extension-a/first
> (ucd-name->symbol 63755) 'cjk-compatibility-ideograph/f90b
procedure
(ucd-name-aliases c failure-result) → (listof string?)
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
procedure
(ucd-general-category c failure-result) → symbol?
c : codepoint? failure-result : (lambda () (raise-arguments-error ...))
> (define macron (codepoint->char 772))
> (for ([char (list #\nul #\space #\a #\A #\ༀ #\1 #\½ #\, #\] #\¥ macron)]) (displayln (format "~a => ~a" char (cdr (assoc (ucd-general-category (char->codepoint char)) *general-categories*)))))