1 Codepoint types
A codepoint value is simply a exact-nonnegative-integer? in the inclusive range zero to *max-codepoint-value*. Note that not all codepoints correspond to characters (see codepoint-non-character?, codepoint-utf16-surrogate?, and codepoint-private-use?).
1.1 Module codepoint
(require codepoint) | package: codepoint |
value
procedure
(codepoint? v) → boolean?
v : any/c?
procedure
c : codepoint?
procedure
c : codepoint?
procedure
c : codepoint?
procedure
c : codepoint?
> (codepoint-plane (char->codepoint #\§)) 0
> (codepoint-plane (char->codepoint #\😀)) 1
procedure
(codepoint-plane-name c) → (or/c symbol? #f)
c : codepoint?
> (codepoint-plane-name (char->codepoint #\§)) 'basic-multilingual-plane
> (codepoint-plane-name (char->codepoint #\😀)) 'supplementary-multilingual-plane
procedure
(codepoint->char c) → char?
c : codepoint?
> (codepoint->char 167) #\§
procedure
(char->codepoint v) → boolean?
v : char?
> (format "~x" (char->codepoint #\§)) "a7"
procedure
c : codepoint?
> (codepoint->unicode-string (char->codepoint #\§)) "U+00A7"
procedure
(string->codepoint str) → codepoint?
str : string?
> (string->codepoint "0304") 304
> (string->codepoint "#x0304") 772
> (string->codepoint "0x0304") 772
> (string->codepoint "U+0304") 772
procedure
(assert-codepoint! v [name]) → void?
v : any/c? name : symbol? = 'v
1.2 Module codepoint/range
(require codepoint/range) | package: codepoint |
Many of the properties defined by the Unicode standard are assigned to a range of codepoints and the codepoint-range structure is a typed pair of start and end codepoint values.
struct
(struct codepoint-range (start end) #:constructor-name make-codepoint-range #:prefab) start : codepoint? end : codepoint?
> (define ascii (make-codepoint-range 0 127)) > (codepoint-range-start ascii) 0
> (codepoint-range-end ascii) 127
> (codepoint-range-contains? ascii (char->codepoint #\a)) #t
> (codepoint-range-contains? ascii (char->codepoint #\§)) #f
procedure
(assert-codepoint-range! v [name]) → void?
v : any/c? name : symbol? = 'v
procedure
p : (cons/c codepoint? codepoint?)
> (pair->codepoint-range '(0 . 127)) '#s(codepoint-range 0 127)
procedure
c : codepoint?
> (codepoint->codepoint-range 0) '#s(codepoint-range 0 0)
procedure
cpr : codepoint-range?
> (codepoint-range-length (make-codepoint-range 0 127)) 128
procedure
(codepoint-range=? lhs rhs) → boolean?
lhs : codepoint-range? rhs : codepoint-range?
procedure
(codepoint-range<? lhs rhs) → boolean?
lhs : codepoint-range? rhs : codepoint-range?
procedure
(codepoint-range>? lhs rhs) → boolean?
lhs : codepoint-range? rhs : codepoint-range?
procedure
(codepoint-range-contains? cpr cp) → boolean?
cpr : codepoint-range? cp : codepoint?
procedure
(codepoint-range-contains-any? cpr cp ...) → boolean?
cpr : codepoint-range? cp : codepoint?
procedure
(codepoint-range-contains-all? cpr cp ...) → boolean?
cpr : codepoint-range? cp : codepoint?
procedure
(codepoint-range-intersects? lhs rhs) → boolean?
lhs : codepoint-range? rhs : codepoint-range?
procedure
(codepoint-range-any-intersects? cpr-list) → boolean?
cpr-list : (listof codepoint-range?)
procedure
(codepoint-range->inclusive-range cpr) → range?
cpr : codepoint-range?
procedure
(codepoint-range->in-inclusive-range cpr) → range?
cpr : codepoint-range?
> (define ascii-lowercase-letters (make-codepoint-range (char->codepoint #\a) (char->codepoint #\z)))
> (for ([letter (codepoint-range->in-inclusive-range ascii-lowercase-letters)]) (display (codepoint->char letter))) abcdefghijklmnopqrstuvwxyz
procedure
cpr : codepoint-range?
> (define ascii-lowercase-letters (make-codepoint-range (char->codepoint #\a) (char->codepoint #\z))) > (display (codepoint-range->unicode-string ascii-lowercase-letters)) U+0061..U+007A
1.3 Module codepoint/range-dict
(require codepoint/range-dict) | package: codepoint |
As some of the Unicode character property files maintain common properties for codepoint ranges they take up less space both as data in the package and in-memory at runtime. However, these cannot be directly indexed by codepoint to find a property value. The range-dict structure provides basic dict? functions taking a codepoint as key but performs a search through the ranges to find a match.
procedure
(range-dict? v) → boolean?
v : any/c
procedure
(make-range-dict data) → range-dict?
data : (listof (cons/c codepoint-range? hash?))
> (make-range-dict (list (cons (make-codepoint-range 0 127) (make-hash '((block-name "Basic Latin")))) (cons (make-codepoint-range 128 255) (make-hash '((block-name "Latin-1 Supplement")))) (cons (make-codepoint-range 256 383) (make-hash '((block-name "Latin Extended-A")))) (cons (make-codepoint-range 384 591) (make-hash '((block-name "Latin Extended-B"))))))
'#s(rangedict
#((#s(codepoint-range 0 127) . #hash((block-name . ("Basic Latin"))))
(#s(codepoint-range 128 255)
.
#hash((block-name . ("Latin-1 Supplement"))))
(#s(codepoint-range 256 383)
.
#hash((block-name . ("Latin Extended-A"))))
(#s(codepoint-range 384 591)
.
#hash((block-name . ("Latin Extended-B"))))))
procedure
dict : range-dict?
procedure
(range-dict-has-key? dict key) → boolean?
dict : range-dict? key : codepoint-range?
procedure
(range-dict-ref dict key failure-result) → hash?
dict : range-dict? key : codepoint? failure-result : (lambda () (raise-arguments-error ...))
If failure-result is a procedure, it is called (through a tail call) with no arguments to produce the result.
Otherwise, failure-result is returned as the result.