On this page:
2.1 Module codepoint/  properties
*corresponding-unicode-version*
ucd-ascii?
ucd-latin-1?
ucd-name
ucd-name->symbol
ucd-name-aliases
ucd-general-category
ucd-letter-category?
ucd-cased-letter-category?
ucd-mark-category?
ucd-number-category?
ucd-punctuation-category?
ucd-symbol-category?
ucd-separator-category?
ucd-other-category?
ucd-codepoint-type
ucd-canonical-combining-class
ucd-bidi-class
ucd-bidi-mirrored?
ucd-has-mirror-glyph?
ucd-mirror-glyph
ucd-bracket?
ucd-bracket-type
ucd-matching-bracket
ucd-decomposition-type
ucd-decomposition-mapping
ucd-numeric-type
ucd-numeric-value
ucd-simple-uppercase-mapping
ucd-simple-lowercase-mapping
ucd-simple-titlecase-mapping
ucd-age
ucd-block-name
ucd-scripts
ucd-script-extensions
ucd-line-break
2.2 Module codepoint/  enums
*bidi-classes*
*case-folding-status*
*codepoint-types*
*combining-classes*
*decomposition-compatibility-tags*
*general-categories*
*line-breaks*
*name-alias-types*
*numeric-types*
8.12

2 UCD Properties🔗ℹ

The Unicode standard defines a large number of character properties which describe the type and behavior of characters and character compositions. The data supporting these properties is collected into the Unicode Character Database (UCD) and made available as either a set of text files or XML files.

The functions below rely on a set of racket source files, expressed either as hashes or rang dicts, that are loaded lazily to fetch specific property values. The loading of the source data incurs a runtime penalty for the first call that requires that specific data but once loaded this penalty is avoided in future calls.

These source files are generated by tooling from the latest UCD files as described in the separate section Data Generator.

2.1 Module codepoint/properties🔗ℹ

 (require codepoint/properties) package: codepoint

The functions below are either directly mapped to to a character property, or are derived from a character property.

Examples:
> (define cp (char->codepoint #\§))
> (ucd-latin-1? cp)

#t

> (ucd-name cp)

"SECTION SIGN"

> (ucd-name-aliases cp)

ucd-name-aliases: no property data found for codepoint

  codepoint: 167

  property: 'name-aliases

> (ucd-general-category cp)

'Po

> (cdr (assoc (ucd-general-category cp) *general-categories*))

"Other punctuation"

> (ucd-age cp)

"1.1"

> (ucd-block-name cp)

"Latin-1 Supplement"

> (ucd-scripts cp)

'(Common)

> (ucd-script-extensions cp (lambda () "None found!"))

"None found!"

> (ucd-line-break cp)

'AI

> (cdr (assoc (ucd-line-break cp) *line-breaks*))

"Ambiguous (Alphabetic or Ideographic)"

For any function below that performs a property lookup and has a parameter named failure-result, if no value is found for codepoint, then failure-result determines the result:

This is a string representation of the Unicode version of the data files used to generate the functions below.

Example:
> (format
    "Generated from UCD data, version ~a"
    *corresponding-unicode-version*)

"Generated from UCD data, version 14.0.0"

procedure

(ucd-ascii? c)  boolean?

  c : codepoint?
Returns #t if the codepoint is in the ASCII range. This function does not rely on loading any data.

procedure

(ucd-latin-1? c)  boolean?

  c : codepoint?
Returns #t if the codepoint is in the Latin-1 range. This function does not rely on loading any data.

procedure

(ucd-name c)  string?

  c : codepoint?
Returns the name of this codepoint, this name is expressed in ASCII uppercase and a small set of punctuation characters.

procedure

(ucd-name->symbol c)  symbol?

  c : codepoint?
Returns the name of this codepoint transformed into a symbol. Certain characters are replaced during the transform and so it is not bi-directional.

Examples:
> (ucd-name->symbol 73)

'latin-capital-letter-i

> (ucd-name->symbol 0)

'control/0

> (ucd-name->symbol 13312)

'cjk-ideograph-extension-a/first

> (ucd-name->symbol 63755)

'cjk-compatibility-ideograph/f90b

procedure

(ucd-name-aliases c failure-result)  (listof string?)

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Return a list of alias names for the codepoint.

procedure

(ucd-general-category c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns a symbol corresponding to the codepoint general category. This symbol is the commonly used abbreviated for, for example 'Lu for Letter, uppercase.

Examples:
> (define macron (codepoint->char 772))
> (for ([char (list #\nul #\space #\a #\A #\ༀ #\1 #\½ #\, #\] #\¥ macron)])
    (displayln
      (format "~a  =>  ~a"
        char
        (cdr
          (assoc
            (ucd-general-category (char->codepoint char))
            *general-categories*)))))

  =>  Control

   =>  Space separator

a  =>  Lowercase letter

A  =>  Uppercase letter

  =>  Other letter

1  =>  Decimal digit number

½  =>  Other number

,  =>  Other punctuation

]  =>  Close punctuation

¥  =>  Currency symbol

̄  =>  Non-spacing mark

See *general-categories* for a mapping from this symbol to a description.

procedure

(ucd-letter-category? c failure-result)  boolean?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as a letter.

procedure

(ucd-cased-letter-category? c    
  failure-result)  boolean?
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as a cased (lower, upper, title) letter.

procedure

(ucd-mark-category? c failure-result)  boolean?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as a mark.

procedure

(ucd-number-category? c failure-result)  boolean?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as a number.

procedure

(ucd-punctuation-category? c    
  failure-result)  boolean?
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as punctuation.

procedure

(ucd-symbol-category? c failure-result)  boolean?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as a symbol.

procedure

(ucd-separator-category? c failure-result)  boolean?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as a separator.

procedure

(ucd-other-category? c failure-result)  boolean?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns #t if the codepoint’s general category denotes it as an other value, these are generally non-character codepoints.

procedure

(ucd-codepoint-type c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Codepoint general categories can be grouped into one of a set of types, this function returns the type as a symbol derived from it’s general category.

Examples:
> (ucd-codepoint-type (char->codepoint #\nul))

'control

> (ucd-codepoint-type (char->codepoint #\space))

'graphic

> (ucd-codepoint-type (char->codepoint #\a))

'graphic

> (ucd-codepoint-type (char->codepoint #\A))

'graphic

> (ucd-codepoint-type (char->codepoint #\ༀ))

'graphic

> (ucd-codepoint-type (char->codepoint #\1))

'graphic

> (ucd-codepoint-type (char->codepoint #\½))

'graphic

> (ucd-codepoint-type (char->codepoint #\,))

'graphic

> (ucd-codepoint-type (char->codepoint #\]))

'graphic

> (ucd-codepoint-type (char->codepoint #\¥))

'graphic

> (ucd-codepoint-type 772)

'graphic

See *codepoint-types* for a mapping from this symbol to a description.

procedure

(ucd-canonical-combining-class c    
  failure-result)  symbol?
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

Examples:
> (assoc (ucd-canonical-combining-class (char->codepoint #\space)) *combining-classes*)

'(0

  .

  "Spacing and enclosing marks; also many vowel and consonant signs, even if nonspacing")

> (assoc (ucd-canonical-combining-class (char->codepoint #\a)) *combining-classes*)

'(0

  .

  "Spacing and enclosing marks; also many vowel and consonant signs, even if nonspacing")

> (assoc (ucd-canonical-combining-class 772) *combining-classes*)

'(230 . "Distinct marks directly above")

> (assoc (ucd-canonical-combining-class 3954) *combining-classes*)

#f

See *combining-classes* for a mapping from this symbol to a description.

procedure

(ucd-bidi-class c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

Examples:
> (assoc (ucd-bidi-class (char->codepoint #\nul)) *bidi-classes*)

'(BN . "Boundary Neutral")

> (assoc (ucd-bidi-class (char->codepoint #\space)) *bidi-classes*)

'(WS . "White Space")

> (assoc (ucd-bidi-class (char->codepoint #\A)) *bidi-classes*)

'(L . "Left-to-right")

> (assoc (ucd-bidi-class (char->codepoint #\א)) *bidi-classes*)

'(R . "Right-to-left")

> (assoc (ucd-bidi-class (char->codepoint #\ؠ)) *bidi-classes*)

'(AL . "Arabic Letter")

> (assoc (ucd-bidi-class (char->codepoint #\1)) *bidi-classes*)

'(EN . "European Number")

> (assoc (ucd-bidi-class (char->codepoint #\!)) *bidi-classes*)

'(ON . "Other Neutral")

See *bidi-classes* for a mapping from this symbol to a description.

procedure

(ucd-bidi-mirrored? c)  boolean?

  c : codepoint?
...

procedure

(ucd-has-mirror-glyph? c)  boolean?

  c : codepoint?
...

procedure

(ucd-mirror-glyph c failure-result)  codepoint?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

procedure

(ucd-bracket? c)  boolean?

  c : codepoint?
Returns #t if this codepoint is considered to be a bracket.

procedure

(ucd-bracket-type c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns one of open, close, or none to denote the type of the bracket codepoint.

procedure

(ucd-matching-bracket c failure-result)  codepoint?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns the matching bracket codepoint, for a bracket whose type is open it will return the corresponding closing bracket codepoint and for a bracket whose type is close it will return the corresponding opening bracket codepoint.

procedure

(ucd-decomposition-type c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

procedure

(ucd-decomposition-mapping c 
  failure-result) 
  (listof codepoint?)
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

procedure

(ucd-numeric-type c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns the type of value a codepoint represents, if it is a numeric representation.

Examples:
> (ucd-numeric-type (char->codepoint #\3))

'decimal

> (ucd-numeric-type (char->codepoint #\¼))

'numeric

> (ucd-numeric-type (char->codepoint #\⒍))

'digit

> (ucd-numeric-type (char->codepoint #\㊾))

'numeric

> (ucd-numeric-type (char->codepoint #\₂))

'digit

> (ucd-numeric-type (char->codepoint #\ⅳ))

'numeric

> (ucd-numeric-type (char->codepoint #\六))

'numeric

> (ucd-numeric-type (char->codepoint #\༣))

'decimal

> (ucd-numeric-type (char->codepoint #\𐄎))

'numeric

See *numeric-types* for a mapping from this symbol to a description.

procedure

(ucd-numeric-value c failure-result)  rational?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns the actual value a codepoint represents, if it is a numeric representation.

procedure

(ucd-simple-uppercase-mapping c    
  failure-result)  codepoint?
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
A simple mapping from a title or lower cased codepoint to it’s corresponding upper cased codepoint.

procedure

(ucd-simple-lowercase-mapping c    
  failure-result)  codepoint?
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
A simple mapping from a title or upper cased codepoint to it’s corresponding lower cased codepoint.

procedure

(ucd-simple-titlecase-mapping c    
  failure-result)  codepoint?
  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
A simple mapping from a lower or upper cased codepoint to it’s corresponding title cased codepoint.

procedure

(ucd-age c failure-result)  string?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns a string denoting the version of the Unicode standard the codepoint was introduced.

Examples:
> (define groucho-emoji #\🥸)
> (ucd-age (char->codepoint #\nul))

"1.1"

> (ucd-age (char->codepoint #\space))

"1.1"

> (ucd-age (char->codepoint #\a))

"1.1"

> (ucd-age (char->codepoint #\A))

"1.1"

> (ucd-age (char->codepoint #\ༀ))

"2.0"

> (ucd-age (char->codepoint #\1))

"1.1"

> (ucd-age (char->codepoint #\½))

"1.1"

> (ucd-age (char->codepoint #\,))

"1.1"

> (ucd-age (char->codepoint #\]))

"1.1"

> (ucd-age (char->codepoint #\¥))

"1.1"

> (ucd-age (char->codepoint #\€))

"2.1"

> (ucd-age (char->codepoint groucho-emoji))

"13.0"

> (ucd-age 772)

"1.1"

procedure

(ucd-block-name c failure-result)  string?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns the name of the block (codepoint range) containing the codepoint.

Examples:
> (ucd-block-name (char->codepoint #\nul))

"Basic Latin"

> (ucd-block-name (char->codepoint #\space))

"Basic Latin"

> (ucd-block-name (char->codepoint #\a))

"Basic Latin"

> (ucd-block-name (char->codepoint #\A))

"Basic Latin"

> (ucd-block-name (char->codepoint #\ༀ))

"Tibetan"

> (ucd-block-name (char->codepoint #\1))

"Basic Latin"

> (ucd-block-name (char->codepoint #\½))

"Latin-1 Supplement"

> (ucd-block-name (char->codepoint #\,))

"Basic Latin"

> (ucd-block-name (char->codepoint #\]))

"Basic Latin"

> (ucd-block-name (char->codepoint #\¥))

"Latin-1 Supplement"

> (ucd-block-name 772)

"Combining Diacritical Marks"

procedure

(ucd-scripts c failure-result)  (listof symbol?)

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

Examples:
> (ucd-scripts (char->codepoint #\nul))

'(Common)

> (ucd-scripts (char->codepoint #\space))

'(Common)

> (ucd-scripts (char->codepoint #\a))

'(Latin)

> (ucd-scripts (char->codepoint #\A))

'(Latin)

> (ucd-scripts (char->codepoint #\ༀ))

'(Tibetan)

> (ucd-scripts (char->codepoint #\1))

'(Common)

> (ucd-scripts (char->codepoint #\½))

'(Common)

> (ucd-scripts (char->codepoint #\,))

'(Common)

> (ucd-scripts (char->codepoint #\]))

'(Common)

> (ucd-scripts (char->codepoint #\¥))

'(Common)

> (ucd-scripts 772)

'(Inherited)

procedure

(ucd-script-extensions c failure-result)  (listof symbol?)

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
...

Examples:
> (define (display-scripts cpc)
    (display (format "~a => Script: ~a" cpc (ucd-scripts (char->codepoint cpc))))
    (let ([extensions (ucd-script-extensions (char->codepoint cpc) #f)])
      (if (false? extensions)
          (newline)
          (displayln (format ", extensions: ~a" extensions)))))
> (display-scripts #\𐋡)

𐋡 => Script: (Common), extensions: (Arab Copt)

> (display-scripts #\჻)

჻ => Script: (Common), extensions: (Geor Latn)

> (display-scripts #\꜀)

꜀ => Script: (Common), extensions: (Hani Latn)

> (display-scripts #\a)

a => Script: (Latin)

procedure

(ucd-line-break c failure-result)  symbol?

  c : codepoint?
  failure-result : (lambda () (raise-arguments-error ...))
Returns the line break property for this codepoint according to the report UAX #14: Unicode Line Breaking Algorithm.

Examples:
> (assoc (ucd-line-break (char->codepoint #\space)) *line-breaks*)

'(SP . "Space")

> (assoc (ucd-line-break (char->codepoint #\-)) *line-breaks*)

'(HY . "Hyphen")

> (assoc (ucd-line-break (char->codepoint #\,)) *line-breaks*)

'(IS . "Infix Numeric Separator")

> (assoc (ucd-line-break (char->codepoint #\a)) *line-breaks*)

'(AL . "Alphabetic")

> (assoc (ucd-line-break (char->codepoint #\Z)) *line-breaks*)

'(AL . "Alphabetic")

See *line-breaks* for a mapping from this symbol to a description.

2.2 Module codepoint/enums🔗ℹ

 (require codepoint/enums) package: codepoint

A mapping from the abbreviation for a Bidi class to their descriptions taken from UAX #44; Bidirectional Class Values.

...

A mapping from codepoint type symbols to their descriptions taken from Chapter 2, Table 2-3 Types of Code Points.

A mapping from canonical combining class symbols to their descriptions taken from UAX #44; Table 15. Canonical_Combining_Class Values

A mapping from decomposition compatibility formatting tag symbols to their descriptions taken from UAX #44; Table 14. Compatibility Formatting Tags

A mapping from general category abbreviation symbols to their descriptions taken from UAX #44; General_Category_Values

A mapping from line break symbols to their descriptions taken from UAX #14; Table 1. Line Breaking Classes

A mapping from name alias type symbols to their descriptions. These descriptions were taken from the source UCD file; they are further described in this article.

A mapping from numeric type symbols to their descriptions taken from UAX #44; Numeric_Type