Xenomorph: binary encoding & decoding

8.17

Xenomorph: binary encoding & decoding🔗ℹ

Matthew Butterick <mb@mbtype.com>

This package is in development. I make no commitment to maintaining the public interface documented below.

(require xenomorph)

package: xenomorph

Hands up: who likes working with binary formats?

OK, just a few of you, in the back. You’re free to go.

As for everyone else: Xenomorph eases the pain of working with binary formats. Instead of laboriously counting bytes —

You describe a binary format declaratively by using smaller ingredients — e.g., integers, strings, lists, pointers, dicts, and perhaps other nested encodings. This is known as a xenomorphic object.
This xenomorphic object can then be used as a binary encoder, allowing you to convert Racket values to binary and write them out to a file.
But wait, there’s more: once defined, this xenomorphic object can also be used as a binary decoder, reading bytes and parsing them into Racket values.

So one binary-format definition can be used for both input and output. Meanwhile, Xenomorph handles all the dull housekeeping of counting bytes (because somebody has to).

This package is derived principally from Devon Govett’s restructure library for Node.js. Thanks for doing the heavy lifting, dude.

1 Installation🔗ℹ

At the command line:

raco pkg install xenomorph

After that, you can update the package from the command line:

raco pkg update xenomorph

Invoke the library in a source file by importing it in the usual way:

(require xenomorph)

2 The big picture🔗ℹ

2.1 Bytes and byte strings🔗ℹ

Suppose we have a file on disk. What’s in the file? Without knowing anything else, we can at least say the file contains a sequence of bytes. A byte is the smallest unit of data storage. It’s not, however, the smallest unit of information storage — that would be a bit. But when we read (or write) from disk (or other source, like memory), we work with bytes. A byte holds eight bits, so it can take on values between 0 and 255, inclusive.

In Racket, a fixed-length array of bytes is also known as a byte string. It prints as a series of values between quotation marks, prefixed with #:

#"ABC"

Caution: though this looks similar to the ordinary string "ABC", we’re better off thinking of it as a block of integers that are sometimes displayed as characters for convenience. For instance, the byte string above represents three bytes valued 65, 66, and 67. This byte string could also be written in hexadecimal like so:

#"\x41\x42\x43"

Or octal like so:

#"\101\102\103"

All three mean the same thing. (If you like, confirm this by trying them on the REPL.)

We can also make an equivalent byte string with bytes. As above, Racket doesn’t care how we notate the values, as long as they’re between 0 and 255:

Examples:

> (bytes 65 66 67)
#"ABC"
> (bytes (+ 31 34) (* 3 22) (- 100 33))
#"ABC"
> (apply bytes (map char->integer '(#\A #\B #\C)))
#"ABC"

Byte values between 32 and 127 are printed as characters. Other values are printed in octal:

Example:

> (bytes 65 66 67 154 206 255)
#"ABC\232\316\377"

If you think this printing convention is a little weird, I agree. But that’s how Racket does it.

If we prefer to deal with lists of integers, we can always use bytes->list and list->bytes:

Examples:

> (bytes->list #"ABC\232\316\377")
'(65 66 67 154 206 255)
> (list->bytes '(65 66 67 154 206 255))
#"ABC\232\316\377"

The key point: the # prefix tells us we’re looking at a byte string, not an ordinary string.

2.2 Binary formats🔗ℹ

Back to files. Files are classified as being either binary or text. (A distinction observed by Racket functions such as write-to-file.) When we speak of binary vs. text, we’re saying something about the internal structure of the byte sequence — what values those bytes represent. We’ll call this internal structure the binary format of the file.

This internal structure is also called an encoding. Here, however, I avoid using that term as a synonym for binary format, because I prefer to reserve it for when we talk about encoding and decoding as operations on data.

3 Tutorials🔗ℹ

3.1 A binary format for complex numbers🔗ℹ

Racket natively supports complex numbers. Suppose we want to encode these numbers in a binary format without losing precision. How would we do it?

First, we need to understand Racket’s recipe for a complex number:

A complex number has a real part and an imaginary part. The coeffiecient of each part is a real number.
A real number is either a inexact number (that is, a floating-point number) or an exact number.
An exact number is a rational number — i.e., a number with a numerator and denominator.
The numerator and denominator can each be an arbitrarily large signed integer, which we’ll call a big integer to distinguish it from fixed-size integers otherwise common in binary formats.

To make a binary format for complex numbers, we build the format by composing smaller ingredients into bigger ones. So we’ll work the recipe from bottom to top, composing our ingredients as we go.

3.1.1 Big integers🔗ℹ

Let’s start with the big integers. We can’t use an existing signed-integer type like int32 because our big integers won’t necessarily fit. For that matter, this also rules out any type derived from x:int%, because all xenomorphic integers have a fixed size.

Instead, we need to use a variable-length type. How about an x:string? If we don’t specify a #:length argument, it can be arbitrarily long. All we need to do is convert our number to a string before encoding (with number->string) and then convert string to number after decoding (with string->number).

> (define bigint (x:string #:pre-encode number->string
#:post-decode string->number))
> (define abigint (- (expt 2 80)))
> abigint
-1208925819614629174706176
> (encode bigint abigint #f)
#"-1208925819614629174706176\0"
> (decode bigint #"-1208925819614629174706176\0")
-1208925819614629174706176

3.1.2 Exact numbers🔗ℹ

Next, we handle exact numbers. An exact number is a combination of two big integers representing a numerator and a denominator. So in this case, we need a xenomorphic type that can store two values. How about an x:list? The length of the list will be two, and the type of the list will be our new bigint type.

Similar to before, we use pre-encoding to convert our Racket value into an encodable shape. This time, we convert an exact number into a list of its numerator and denominator. After decoding, we take that list and convert its values back into an exact number (by using /):

> (define exact (x:list #:type bigint
                        #:length 2
                        #:pre-encode (λ (x) (list (numerator x) (denominator x)))
                        #:post-decode (λ (nd) (apply / nd))))
> (encode exact -617/2839 #f)
#"-617\0002839\0"
> (decode exact #"-617\0002839\0")
-617/2839

3.1.3 Real numbers🔗ℹ

A real number is either a floating-point number (for which we can use Xenomorph’s built-in float type) or an exact number (for which we can use the exact type we just defined).

This time, we need an encoder that allows us to choose from among two possibilities. How about an x:versioned-dict? We’ll assign our exact numbers to version 0, and our floats to version 1. (These version numbers are arbitrary — we could pick any two values, but a small integer will fit inside a uint8.)

We specify a #:version-key of 'version. Then in our pre-encode function, we choose the version of the encoding based on whether the input value is exact?.

> (define real (x:versioned-dict
                #:type uint8
                #:version-key 'version
                #:versions
                (list
                 (cons 0 (list (cons 'val exact)))
                 (cons 1 (list (cons 'val float))))
                #:pre-encode (λ (num) (list (cons 'val num)
                                            (cons 'version (if (exact? num)
                                                               0
                                                               1))))
                #:post-decode (λ (h) (hash-ref h 'val))))
> (encode real 123.45 #f)
#"\1f\346\366B"
> (decode real #"\1f\346\366B")
123.44999694824219
> (encode real -1/16 #f)
#"\0-1\00016\0"
> (decode real #"\0-1\00016\0")
-1/16

Notice that the float loses some precision during the encoding & decoding process. This is a natural part of how floating-point numbers work — they are called inexact numbers for this reason — so this is a feature, not a bug.

3.1.4 Complex numbers🔗ℹ

Now we put it all together. A complex number is a combination of a real part and an imaginary part, each of which has a real coefficient. Therefore, we can model a complex number in a binary format just like we did for exact numbers: as a list of two values.

Once again, we use a pre-encoder and post-decoder to massage the data. On the way in, the pre-encoder turns the complex number into a list of real-number coefficients with real-part and imag-part. On the way out, these coefficients are reformed into a complex number through some easy addition and multiplication.

> (define complex (x:list #:type real
                          #:length 2
                          #:pre-encode (λ (num) (list (real-part num) (imag-part num)))
                          #:post-decode (λ (ri) (+ (first ri) (* 0+1i (second ri))))))
> (encode complex 123.45-6.789i #f)
#"\1f\346\366B\1}?\331\300"
> (decode complex #"\1f\346\366B\1}?\331\300")
123.44999694824219-6.789000034332275i
> (encode complex 1/234-5/678i #f)
#"\0001\000234\0\0-5\000678\0"
> (decode complex #"\0001\000234\0\0-5\000678\0")
1/234-5/678i

4 Main interface🔗ℹ

procedure
(xenomorphic? x) → boolean?
x : any/c

Whether x is an object of type x:base%.

procedure
(decode xenomorphic-obj
[ byte-source
#:parent parent]
arg ...) → any/c
  xenomorphic-obj : xenomorphic?
  byte-source : (or/c bytes? input-port?) = (current-input-port)
  parent : (or/c xenomorphic? #false) = #false
  arg : any/c

Read bytes from byte-source and convert them to a Racket value using xenomorphic-obj as the decoder.

If byte-source contains more bytes than xenomorphic-obj needs to decode a value, it reads as many bytes as necessary and leaves the rest.

procedure
(encode xenomorphic-obj
val
[ byte-dest
#:parent parent]
arg ...) → (or/c void? bytes?)
  xenomorphic-obj : xenomorphic?
  val : any/c
  byte-dest : (or/c output-port? #false) = (current-output-port)
  parent : (or/c xenomorphic? #false) = #false
  arg : any/c

Convert val to bytes using xenomorphic-obj as the encoder.

If byte-dest is an output-port?, the bytes are written there and the return value is (void). If byte-dest is #false, the encoded byte string is the return value.

If val does not match the xenomorphic-obj type appropriately — for instance, you try to encode a negative integer using an unsigned integer type like uint8 — then an error will arise.

5 Core xenomorphic objects🔗ℹ

These basic xenomorphic objects can be used on their own, or combined to make bigger xenomorphic objects.

Note on naming: the main xenomorphic objects have an x: prefix to distinguish them from (and prevent name collisions with) the ordinary Racket thing (for instance, x:list vs. list). Other xenomorphic objects (like uint8) don’t have this prefix, because it seems unnecessary and therefore laborious.

class
x:base% : class?

superclass: object%

When making your own xenomorphic objects, usually you’ll want to stick together existing core objects, or inherit from one of those classes. Inheriting from x:base% is also allowed, but you have to do all the heavy lifting.

method
(send a-x:base x:decode input-port
parent
args ...) → any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
  args : any/c
Refine this method with augment.
Read bytes from input-port and convert them into a Racket value. Called by decode.
method
(send a-x:base post-decode val) → any/c
  val : any/c
Hook for post-processing on val after it’s returned by x:decode but before it’s returned by decode.
method
(send a-x:base x:encode val
output-port
parent
args ...) → bytes?
  val : any/c
  output-port : output-port?
  parent : (or/c xenomorphic? #false)
  args : any/c
Refine this method with augment.
Convert a value into a byte string which is written to output-port. Called by encode.
method
(send a-x:base pre-encode val) → any/c
  val : any/c
Hook for pre-processing on val after it’s passed to encode but before it’s passed to x:encode.
method
(send a-x:base x:size val parent args ...)
→ exact-nonnegative-integer?
  val : any/c
  parent : (or/c xenomorphic? #false)
  args : any/c
Refine this method with augment.
The length of the byte string that val would produce if it were encoded using x:encode. Called by size.

5.1 Numbers🔗ℹ

(require xenomorph/number)

package: xenomorph

5.1.1 Little endian vs. big endian🔗ℹ

When an integer is more than one byte long, one has to consider how the bytes are ordered. If the byte representing the lowest 8 bits appears first, it’s known as little endian byte ordering. If this byte appears last, it’s called big endian byte ordering.

For example, the integer 1 in 32-bit occupies four bytes. In little endian, the bytes would be in increasing order, or #"\1\0\0\0". In big endian, the bytes would be in decreasing order, or #"\0\0\0\1".

When encoding and decoding binary formats, one has to be consistent about endianness, because it will change the meaning of the binary value. For instance, if we inadvertently treated the big-endian byte string #"\0\0\0\1" as little endian, we’d get the result 16777216 instead of the expected 1.

procedure
(endian-value? val) → boolean?
val : any/c

Whether val is either 'be (representing big endian) or 'le (representing little endian).

value
system-endian : endian-value?

The endian value of the current system. Big endian is represented as 'be and little endian as 'le. This can be used as an argument for classes that inherit from x:number%.

Use this value carefully, however. Binary formats are usually defined using one endian convention or the other (so that data can be exchanged among machines regardless of the endianness of the underlying system).

class
x:number% : class?

superclass: x:base%

constructor
(new x:number%
    [size size]
    [signed? signed?]
    [endian endian]) → (is-a?/c x:number%)
  size : exact-positive-integer?
  signed? : boolean?
  endian : endian-value?
Create class instance that represents a binary number format size bytes long, either signed? or not, with endian byte ordering. The endian arugment can be system-endian.

5.1.2 Integers🔗ℹ

class
x:int% : class?

superclass: x:number%

Base class for integer formats. Use x:int to conveniently instantiate new integer formats.

procedure
(x:int? x) → boolean?
x : any/c

Whether x is an object of type x:int%.

procedure
(x:int [ size-arg
#:size size-kw
#:signed signed
#:endian endian
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:int?
  size-arg : (or/c exact-positive-integer? #false) = #false
  size-kw : exact-positive-integer? = 2
  signed : boolean? = #true
  endian : endian-value? = system-endian
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:int%)) = x:int%

Generate an instance of x:int% (or a subclass of x:int%) with certain optional attributes.

size-arg or size-kw (whichever is provided, though size-arg takes precedence) controls the encoded size.

signed controls whether the integer is signed or unsigned.

endian controls the byte-ordering convention.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

value
int8 : x:int?
value
int16 : x:int?
value
int24 : x:int?
value
int32 : x:int?
value
int64 : x:int?
value
uint8 : x:int?
value
uint16 : x:int?
value
uint24 : x:int?
value
uint32 : x:int?
value
uint64 : x:int?

The common integer types, using system-endian endianness. The u prefix indicates unsigned. The numerical suffix indicates bit length.

Use these carefully, however. Binary formats are usually defined using one endian convention or the other (so that data can be exchanged among machines regardless of the endianness of the underlying system).

Examples:

> (encode int8 1 #f)
#"\1"
> (encode int16 1 #f)
#"\1\0"
> (encode int24 1 #f)
#"\1\0\0"
> (encode int32 1 #f)
#"\1\0\0\0"
> (encode int64 1 #f)
#"\1\0\0\0\0\0\0\0"
> (encode int8 -128 #f)
#"\200"
> (encode int16 -128 #f)
#"\200\377"
> (encode int24 -128 #f)
#"\200\377\377"
> (encode int32 -128 #f)
#"\200\377\377\377"
> (encode int64 -128 #f)
#"\200\377\377\377\377\377\377\377"
> (encode uint8 1 #f)
#"\1"
> (encode uint16 1 #f)
#"\1\0"
> (encode uint24 1 #f)
#"\1\0\0"
> (encode uint32 1 #f)
#"\1\0\0\0"
> (encode uint64 1 #f)
#"\1\0\0\0\0\0\0\0"
; negative numbers cannot be encoded as unsigned ints, of course
> (encode uint8 -128 #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: -128
> (encode uint16 -128 #f)
encode: contract violation
  expected: value that fits within unsigned 2-byte int (0 to
65535)
  given: -128
> (encode uint24 -128 #f)
encode: contract violation
  expected: value that fits within unsigned 3-byte int (0 to
16777215)
  given: -128
> (encode uint32 -128 #f)
encode: contract violation
  expected: value that fits within unsigned 4-byte int (0 to
4294967295)
  given: -128
> (encode uint64 -128 #f)
encode: contract violation
  expected: value that fits within unsigned 8-byte int (0 to
18446744073709551615)
  given: -128
> (decode int8 #"1" #f)
49
> (decode int16 #"10" #f)
12337
> (decode int24 #"100" #f)
3158065
> (decode int32 #"1000" #f)
808464433
> (decode int64 #"10000000" #f)
3472328296227680305
> (decode uint8 #"1" #f)
49
> (decode uint16 #"10" #f)
12337
> (decode uint24 #"100" #f)
3158065
> (decode uint32 #"1000" #f)
808464433
> (decode uint64 #"10000000" #f)
3472328296227680305

value
int8be : x:int?
value
int16be : x:int?
value
int24be : x:int?
value
int32be : x:int?
value
int64be : x:int?
value
uint8be : x:int?
value
uint16be : x:int?
value
uint24be : x:int?
value
uint32be : x:int?
value
uint64be : x:int?

Big-endian versions of the common integer types. The u prefix indicates unsigned. The numerical suffix indicates bit length. int8be and uint8be are included for consistency, but as one-byte types, they are not affected by endianness.

Examples:

> (encode int8be 1 #f)
#"\1"
> (encode int16be 1 #f)
#"\0\1"
> (encode int24be 1 #f)
#"\0\0\1"
> (encode int32be 1 #f)
#"\0\0\0\1"
> (encode int64be 1 #f)
#"\0\0\0\0\0\0\0\1"
> (encode int8be -128 #f)
#"\200"
> (encode int16be -128 #f)
#"\377\200"
> (encode int24be -128 #f)
#"\377\377\200"
> (encode int32be -128 #f)
#"\377\377\377\200"
> (encode int64be -128 #f)
#"\377\377\377\377\377\377\377\200"
> (encode uint8be 1 #f)
#"\1"
> (encode uint16be 1 #f)
#"\0\1"
> (encode uint24be 1 #f)
#"\0\0\1"
> (encode uint32be 1 #f)
#"\0\0\0\1"
> (encode uint64be 1 #f)
#"\0\0\0\0\0\0\0\1"
> (decode int8be #"1" #f)
49
> (decode int16be #"10" #f)
12592
> (decode int24be #"100" #f)
3223600
> (decode int32be #"1000" #f)
825241648
> (decode int64be #"10000000" #f)
3544385890265608240
> (decode int8be #"1" #f)
49
> (decode int16be #"10" #f)
12592
> (decode int24be #"100" #f)
3223600
> (decode int32be #"1000" #f)
825241648
> (decode int64be #"10000000" #f)
3544385890265608240
> (decode uint8be #"1" #f)
49
> (decode uint16be #"10" #f)
12592
> (decode uint24be #"100" #f)
3223600
> (decode uint32be #"1000" #f)
825241648
> (decode uint64be #"10000000" #f)
3544385890265608240

value
int8le : x:int?
value
int16le : x:int?
value
int24le : x:int?
value
int32le : x:int?
value
int64le : x:int?
value
uint8le : x:int?
value
uint16le : x:int?
value
uint24le : x:int?
value
uint32le : x:int?
value
uint64le : x:int?

Little-endian versions of the common integer types. The u prefix indicates unsigned. The numerical suffix indicates bit length. int8le and uint8le are included for consistency, but as one-byte types, they are not affected by endianness.

Examples:

> (encode int8le 1 #f)
#"\1"
> (encode int16le 1 #f)
#"\1\0"
> (encode int24le 1 #f)
#"\1\0\0"
> (encode int32le 1 #f)
#"\1\0\0\0"
> (encode int64le 1 #f)
#"\1\0\0\0\0\0\0\0"
> (encode int8le -128 #f)
#"\200"
> (encode int16le -128 #f)
#"\200\377"
> (encode int24le -128 #f)
#"\200\377\377"
> (encode int32le -128 #f)
#"\200\377\377\377"
> (encode int64le -128 #f)
#"\200\377\377\377\377\377\377\377"
> (encode uint8le 1 #f)
#"\1"
> (encode uint16le 1 #f)
#"\1\0"
> (encode uint24le 1 #f)
#"\1\0\0"
> (encode uint32le 1 #f)
#"\1\0\0\0"
> (encode uint64le 1 #f)
#"\1\0\0\0\0\0\0\0"
> (decode int8le #"1" #f)
49
> (decode int16le #"10" #f)
12337
> (decode int24le #"100" #f)
3158065
> (decode int32le #"1000" #f)
808464433
> (decode int64le #"10000000" #f)
3472328296227680305
> (decode uint8le #"1" #f)
49
> (decode uint16le #"10" #f)
12337
> (decode uint24le #"100" #f)
3158065
> (decode uint32le #"1000" #f)
808464433
> (decode uint64le #"10000000" #f)
3472328296227680305

5.1.3 Floats🔗ℹ

class
x:float% : class?

superclass: x:number%

Base class for floating-point number formats. By convention, all floats are signed. Use x:float to conveniently instantiate new floating-point number formats.

procedure
(x:float? x) → boolean?
x : any/c

Whether x is an object of type x:float%.

procedure
(x:float [ size-arg
#:size size-kw
#:endian endian
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:int?
  size-arg : (or/c exact-positive-integer? #false) = #false
  size-kw : exact-positive-integer? = 2
  endian : endian-value? = system-endian
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:float%)) = x:float%

Generate an instance of x:float% (or a subclass of x:float%) with certain optional attributes.

size-arg or size-kw (whichever is provided, though size-arg takes precedence) controls the encoded size.

endian controls the byte-ordering convention.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

value
float : x:float?
value
floatbe : x:float?
value
floatle : x:float?

The common 32-bit floating-point types. They differ in byte-ordering convention: floatbe uses big endian, floatle uses little endian, float uses system-endian.

Examples:

> (encode float 123.456 #f)
#"y\351\366B"
> (encode floatbe 123.456 #f)
#"B\366\351y"
> (encode floatle 123.456 #f)
#"y\351\366B"
> (decode float #"y\351\366B" #f)
123.45600128173828
> (decode floatbe #"y\351\366B" #f)
1.5184998373247989e+35
> (decode floatle #"y\351\366B" #f)
123.45600128173828

value
double : x:float?
value
doublebe : x:float?
value
doublele : x:float?

The common 64-bit floating-point types. They differ in byte-ordering convention: doublebe uses big endian, doublele uses little endian, double uses system-endian.

Examples:

> (encode double 123.456 #f)
#"w\276\237\32/\335^@"
> (encode doublebe 123.456 #f)
#"@^\335/\32\237\276w"
> (encode doublele 123.456 #f)
#"w\276\237\32/\335^@"
> (decode double #"w\276\237\32/\335^@" #f)
123.456
> (decode doublebe #"w\276\237\32/\335^@" #f)
6.319206039931876e+268
> (decode doublele #"w\276\237\32/\335^@" #f)
123.456

5.1.4 Fixed-point numbers🔗ℹ

class
x:fixed% : class?

superclass: x:int%

Base class for fixed-point number formats. Use x:fixed to conveniently instantiate new fixed-point number formats.

constructor
(new x:fixed%
    [size size]
    [signed? signed?]
    [endian endian]
    [fracbits fracbits]) → (is-a?/c x:fixed%)
  size : exact-positive-integer?
  signed? : boolean?
  endian : endian-value?
  fracbits : exact-positive-integer?
Create class instance that represents a fixed-point number format size bytes long, either signed? or not, with endian byte ordering and fracbits of precision.

procedure
(x:fixed? x) → boolean?
x : any/c

Whether x is an object of type x:fixed%.

procedure
(x:fixed [ size-arg
#:size size-kw
#:endian endian
#:fracbits fracbits
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:int?
  size-arg : (or/c exact-positive-integer? #false) = #false
  size-kw : exact-positive-integer? = 2
  endian : endian-value? = system-endian
   fracbits : (or/c exact-positive-integer? #false)
= (/ (* size 8) 2)
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:fixed%)) = x:fixed%

Generate an instance of x:fixed% (or a subclass of x:fixed%) with certain optional attributes.

size-arg or size-kw (whichever is provided, though size-arg takes precedence) controls the encoded size. Defaults to 2.

endian controls the byte-ordering convention.

fracbits controls the number of bits of precision. If no value or #false is passed, defaults to (/ (* size 8) 2).

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

value
fixed16 : x:fixed?
value
fixed16be : x:fixed?
value
fixed16le : x:fixed?

The common 16-bit fixed-point number types with 2 bits of precision. They differ in byte-ordering convention: fixed16be uses big endian, fixed16le uses little endian, fixed16 uses system-endian.

Note that because of the limited precision, the byte encoding is possibly lossy (meaning, if you encode and then decode, you may not get exactly the same number back).

Examples:

> (encode fixed16 123.45 #f)
#"s{"
> (encode fixed16be 123.45 #f)
#"{s"
> (encode fixed16le 123.45 #f)
#"s{"
> (decode fixed16 #"s{" #f)
123.44921875
> (decode fixed16be #"s{" #f)
115.48046875
> (decode fixed16le #"s{" #f)
123.44921875

value
fixed32 : x:fixed?
value
fixed32be : x:fixed?
value
fixed32le : x:fixed?

The common 32-bit fixed-point number types with 4 bits of precision. They differ in byte-ordering convention: fixed32be uses big endian, fixed32le uses little endian, fixed32 uses system-endian.

Note that because of the limited precision, the byte encoding is possibly lossy (meaning, if you encode and then decode, you may not get exactly the same number back).

Examples:

> (encode fixed32 123.45 #f)
#"3s{\0"
> (encode fixed32be 123.45 #f)
#"\0{s3"
> (encode fixed32le 123.45 #f)
#"3s{\0"
> (decode fixed32 #"3s{\0" #f)
123.44999694824219
> (decode fixed32be #"3s{\0" #f)
13171.48046875
> (decode fixed32le #"3s{\0" #f)
123.44999694824219

5.2 Strings🔗ℹ

(require xenomorph/string)

package: xenomorph

Good old strings.

procedure
(supported-encoding? x) → boolean?
x : any/c

Whether x represents a supported encoding: either 'ascii or 'utf8.

class
x:string% : class?

superclass: x:base%

Base class for string formats. Use x:string to conveniently instantiate new string formats.

constructor
(new x:string%
    [len len]
    [encoding encoding]) → (is-a?/c x:string%)
  len : length-resolvable?
  encoding : (or/c procedure? supported-encoding?)
Create class instance that represents a string format of length len. If len is an integer, the string is fixed at that length, otherwise it can be any length.
method
(send a-x:string x:decode input-port
parent) → string?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns a string.
method
(send a-x:string x:encode val
input-port
parent) → bytes?
  val : any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Take a val, convert it to a string if needed, and encode it as a byte string. If len is a xenomorphic? object, the length is encoded at the beginning of the string using that object as the encoder.

procedure
(x:string? x) → boolean?
x : any/c

Whether x is an object of type x:string%.

procedure
(x:string [ len-arg
enc-arg
#:length len-kw
#:encoding enc-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:string?
  len-arg : (or/c length-resolvable? #false) = #false
   enc-arg : (or/c procedure? supported-encoding? #false)
= #false
  len-kw : (or/c length-resolvable? #false) = #false
  enc-kw : (or/c procedure? supported-encoding? #false) = 'utf8
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:string%)) = x:string%

Generate an instance of x:string% (or a subclass of x:string%) with certain optional attributes.

len-arg or len-kw (whichever is provided, though len-arg takes precedence) determines the maximum length in bytes of the encoded string.

If this argument is an integer, the string is limited to that length. The length is not directly encoded.
If it’s a xenomorphic? type, the length is variable, but limited to the size that can be represented by that type. For instance, if len-arg is uint8, then the string can be a maximum of 255 bytes. The length is encoded at the beginning of the byte string.
If it’s another value, like #f, the string has variable length, and is null-terminated.

enc-arg or enc-kw (whichever is provided, though enc-arg takes precedence) determines the encoding of the string. Default is 'utf8. See also supported-encoding?.

Examples:

> (define any-ascii (x:string #f 'ascii))
> (encode any-ascii "ABC" #f)
#"ABC\0"
> (decode any-ascii #"ABC\0")
"ABC"
> (decode any-ascii #"ABC\0DEF")
"ABC"
> (decode any-ascii #"AB")
"AB"
> (define three-ascii (x:string 3 'ascii))
> (encode three-ascii "ABC" #f)
#"ABC"
> (encode three-ascii "ABCD" #f)
encode: contract violation
  expected: string no longer than 3
  given: "ABCD"
> (encode three-ascii "ABÜ" #f)
encode: contract violation
  expected: ascii string
  given: "ABÜ"
> (decode three-ascii #"ABC")
"ABC"
> (decode three-ascii #"ABCD")
"ABC"
> (decode three-ascii (string->bytes/utf-8 "ABÜ"))
decode: contract violation
  expected: ascii string
  result: "ABÃ"
> (define 256-utf8 (x:string uint8 'utf8))
> (encode 256-utf8 "ABC" #f)
#"\3ABC"
> (encode 256-utf8 "ABCD" #f)
#"\4ABCD"
> (encode 256-utf8 "ABÜ" #f)
#"\4AB\303\234"
> (encode 256-utf8 (make-string 256 #\A) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 256

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define (doubler str) (string-append str str))
> (define quad-str (x:string uint32be
#:pre-encode doubler
#:post-decode doubler))
> (encode quad-str "ABC" #f)
#"\0\0\0\6ABCABC"
> (decode quad-str #"\0\0\0\6ABCABC")
"ABCABCABCABC"

5.3 Symbols🔗ℹ

(require xenomorph/symbol)

package: xenomorph

Under the hood, just a wrapper around the x:string% class.

class
x:symbol% : class?

superclass: x:string%

Base class for symbol formats. Use x:symbol to conveniently instantiate new symbol formats.

constructor
(new x:symbol%
    [len len]
    [encoding encoding]) → (is-a?/c x:symbol%)
  len : length-resolvable?
  encoding : (or/c procedure? supported-encoding?)
Create class instance that represents a symbol format of length len. If len is an integer, the symbol is fixed at that length, otherwise it can be any length.
method
(send a-x:symbol x:decode input-port
parent) → symbol?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:string%.
Returns a symbol.
method
(send a-x:symbol x:encode val
input-port
parent) → bytes?
  val : any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:string%.
Take a sequence seq of type items and encode it as a byte string.

procedure
(x:symbol? x) → boolean?
x : any/c

Whether x is an object of type x:symbol%.

procedure
(x:symbol [ len-arg
enc-arg
#:length len-kw
#:encoding enc-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:symbol?
  len-arg : (or/c length-resolvable? #false) = #false
   enc-arg : (or/c procedure? supported-encoding? #false)
= #false
  len-kw : (or/c length-resolvable? #false) = #false
  enc-kw : (or/c procedure? supported-encoding? #false) = 'utf8
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:symbol%)) = x:symbol%

Generate an instance of x:symbol% (or a subclass of x:symbol%) with certain optional attributes, which are the same as x:string.

Examples:

> (define any-ascii (x:symbol #f 'ascii))
> (encode any-ascii 'ABC #f)
#"ABC\0"
> (decode any-ascii #"ABC\0")
'ABC
> (decode any-ascii #"ABC\0DEF")
'ABC
> (decode any-ascii #"AB")
'AB
> (define three-ascii (x:symbol 3 'ascii))
> (encode three-ascii 'ABC #f)
#"ABC"
> (encode three-ascii 'ABCD #f)
encode: contract violation
  expected: string no longer than 3
  given: "ABCD"
> (encode three-ascii 'ABÜ #f)
encode: contract violation
  expected: ascii string
  given: "ABÜ"
> (decode three-ascii #"ABC")
'ABC
> (decode three-ascii #"ABCD")
'ABC
> (decode three-ascii (string->bytes/utf-8 "ABÜ"))
decode: contract violation
  expected: ascii string
  result: "ABÃ"
> (define 256-utf8 (x:symbol uint8 'utf8))
> (encode 256-utf8 'ABC #f)
#"\3ABC"
> (encode 256-utf8 'ABCD #f)
#"\4ABCD"
> (encode 256-utf8 'ABÜ #f)
#"\4AB\303\234"
> (encode 256-utf8 (make-string 256 #\A) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 256
> (define (doubler sym)
  (string->symbol (format "~a~a" sym sym)))
> (define quad-str (x:symbol uint32be
  #:pre-encode doubler
  #:post-decode doubler))
> (encode quad-str "ABC" #f)
#"\0\0\0\6ABCABC"
> (decode quad-str #"\0\0\0\6ABCABC")
'ABCABCABCABC

5.4 Lists🔗ℹ

(require xenomorph/list)

package: xenomorph

Lists in Xenomorph have a type and maybe a length. Every element in the list must have the same type. The list can have a specific length, but it doesn’t need to (in which case the length is encoded as part of the data).

If you want to store items of different types in a single Xenomorph list, wrap them in Pointers so they have a uniform type.

class
x:list% : class?

superclass: x:base%

Base class for list formats. Use x:list to conveniently instantiate new list formats.

constructor
(new x:list%
    [type type]
    [len len]
    [count-bytes? count-bytes?]) → (is-a?/c x:list%)
  type : xenomorphic?
  len : length-resolvable?
  count-bytes? : boolean?
Create class instance that represents a list format with elements of type type. If len is an integer, the list is fixed at that length, otherwise it can be any length.
method
(send a-x:list x:decode input-port parent) → list?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns a list of values whose length is len and where each value is type.
method
(send a-x:list x:encode seq
input-port
parent) → bytes?
  seq : sequence?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Take a sequence seq of type items and encode it as a byte string.

procedure
(x:list? x) → boolean?
x : any/c

Whether x is an object of type x:list%.

procedure
(x:list [ type-arg
len-arg
#:type type-kw
#:length len-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:list?
  type-arg : (or/c xenomorphic? #false) = #false
  len-arg : (or/c length-resolvable? #false) = #false
  type-kw : (or/c xenomorphic? #false) = #false
  len-kw : (or/c length-resolvable? #false) = #false
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:list%)) = x:list%

Generate an instance of x:list% (or a subclass of x:list%) with certain optional attributes.

type-arg or type-kw (whichever is provided, though type-arg takes precedence) determines the type of the elements in the list.

len-arg or len-kw (whichever is provided, though len-arg takes precedence) determines the length of the list. This can be an ordinary integer, but it can also be any value that is length-resolvable?.

Examples:

> (define three-uint8s (x:list uint8 3))
> (encode three-uint8s '(1 2 3) #f)
#"\1\2\3"
> (encode three-uint8s (string->bytes/utf-8 "ABC") #f)
#"ABC"
> (encode three-uint8s '(1 2 3 4) #f)
encode: contract violation
  expected: sequence of 3 values
  given: 4
> (encode three-uint8s '(1000 2000 3000) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 1000
> (encode three-uint8s '(A B C) #f)
encode: contract violation
  expected: integer
  given: 'A
> (decode three-uint8s #"\1\2\3")
'(1 2 3)
> (decode three-uint8s #"\1\2\3\4")
'(1 2 3)
> (decode three-uint8s #"\1\2")
decode: contract violation
  expected: bytes for 3 items
  given: 2
> (define <256-uint8s (x:list #:type uint8 #:length uint8))
> (encode <256-uint8s '(1 2 3) #f)
#"\3\1\2\3"
> (encode <256-uint8s (make-list 500 1) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 500
> (decode <256-uint8s #"\3\1\2\3")
'(1 2 3)
> (decode <256-uint8s #"\3\1\2\3\4")
'(1 2 3)
> (decode <256-uint8s #"\3\1\2")
decode: contract violation
  expected: bytes for 3 items
  given: 2
> (define nested (x:list #:type <256-uint8s #:length uint8))
> (encode nested '((65) (66 66) (67 67 67)) #f)
#"\3\1A\2BB\3CCC"
> (decode nested #"\3\1A\2BB\3CCC")
'((65) (66 66) (67 67 67))

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define (doubler xs) (append xs xs))
> (define quad-list (x:list uint16be
#:pre-encode doubler
#:post-decode doubler))
> (encode quad-list '(1 2 3) #f)
#"\0\1\0\2\0\3\0\1\0\2\0\3"
> (decode quad-list #"\0\1\0\2\0\3\0\1\0\2\0\3")
'(1 2 3 1 2 3 1 2 3 1 2 3)

5.5 Streams🔗ℹ

(require xenomorph/stream)

package: xenomorph

Under the hood, just a wrapper around the x:list% class that produces a stream rather than a list.

The distinguishing feature of a stream is that the evaluation is lazy: elements are only decoded as they are requested (and then they are cached for subsequent use). Therefore, a Xenomorph stream is a good choice when you don’t want to incur the costs of decoding every element immediately (as you will when you use Lists).

class
x:stream% : class?

superclass: x:list%

Base class for stream formats. Use x:stream to conveniently instantiate new stream formats.

constructor
(new x:stream%
    [type type]
    [len len]
    [count-bytes? count-bytes?])
→ (is-a?/c x:stream%)
  type : xenomorphic?
  len : length-resolvable?
  count-bytes? : boolean?
Create class instance that represents a stream format with elements of type type. If len is an integer, the stream is fixed at that length, otherwise it can be any length.
method
(send a-x:stream x:decode input-port
parent) → stream?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:list%.
Returns a stream of values whose length is len and where each value is type.
method
(send a-x:stream x:encode seq
input-port
parent) → bytes?
  seq : sequence?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:list%.
Take a sequence seq of type items and encode it as a byte string.

procedure
(x:stream? x) → boolean?
x : any/c

Whether x is an object of type x:stream%.

procedure
(x:stream [ type-arg
len-arg
#:type type-kw
#:length len-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:stream?
  type-arg : (or/c xenomorphic? #false) = #false
  len-arg : (or/c length-resolvable? #false) = #false
  type-kw : (or/c xenomorphic? #false) = #false
  len-kw : (or/c length-resolvable? #false) = #false
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:stream%)) = x:stream%

Generate an instance of x:stream% (or a subclass of x:stream%) with certain optional attributes, which are the same as x:list.

Examples:

> (define three-uint8s (x:stream uint8 3))
> (encode three-uint8s '(1 2 3) #f)
#"\1\2\3"
> (encode three-uint8s (string->bytes/utf-8 "ABC") #f)
#"ABC"
> (encode three-uint8s '(1 2 3 4) #f)
encode: contract violation
  expected: sequence of 3 values
  given: 4
> (encode three-uint8s '(1000 2000 3000) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 1000
> (encode three-uint8s '(A B C) #f)
encode: contract violation
  expected: integer
  given: 'A
> (decode three-uint8s #"\1\2\3")
#<stream>
> (decode three-uint8s #"\1\2\3\4")
#<stream>
> (decode three-uint8s #"\1\2")
#<stream>
> (define <256-uint8s (x:stream #:type uint8 #:length uint8))
> (encode <256-uint8s '(1 2 3) #f)
#"\3\1\2\3"
> (encode <256-uint8s (make-list 500 1) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 500
> (stream->list (decode <256-uint8s #"\3\1\2\3"))
'(1 2 3)
> (for/list ([val (in-stream (decode <256-uint8s #"\3\1\2\3\4"))])
  val)
'(1 2 3)
> (stream->list (decode <256-uint8s #"\3\1\2"))
decode: contract violation
  expected: at port position 3, not enough bytes for item 2
  given: 3
> (define (doubler xs) (append (stream->list xs) (stream->list xs)))
> (define quad-stream (x:stream uint16be
  #:pre-encode doubler
  #:post-decode doubler))
> (encode quad-stream '(1 2 3) #f)
#"\0\1\0\2\0\3\0\1\0\2\0\3"
> (decode quad-stream #"\0\1\0\2\0\3\0\1\0\2\0\3")
'(1 2 3 1 2 3)

5.6 Vectors🔗ℹ

(require xenomorph/vector)

package: xenomorph

Under the hood, just a wrapper around the x:list% class that decodes to a vector rather than a list.

class
x:vector% : class?

superclass: x:list%

Base class for vector formats. Use x:vector to conveniently instantiate new vector formats.

constructor
(new x:vector%
    [type type]
    [len len]
    [count-bytes? count-bytes?])
→ (is-a?/c x:vector%)
  type : xenomorphic?
  len : length-resolvable?
  count-bytes? : boolean?
Create class instance that represents a vector format with elements of type type. If len is an integer, the vector is fixed at that length, otherwise it can be any length.
method
(send a-x:vector x:decode input-port
parent) → vector?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:list%.
Returns a vector of values whose length is len and where each value is type.
method
(send a-x:vector x:encode seq
input-port
parent) → bytes?
  seq : sequence?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:list%.
Take a sequence seq of type items and encode it as a byte string.

procedure
(x:vector? x) → boolean?
x : any/c

Whether x is an object of type x:vector%.

procedure
(x:vector [ type-arg
len-arg
#:type type-kw
#:length len-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:vector?
  type-arg : (or/c xenomorphic? #false) = #false
  len-arg : (or/c length-resolvable? #false) = #false
  type-kw : (or/c xenomorphic? #false) = #false
  len-kw : (or/c length-resolvable? #false) = #false
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:vector%)) = x:vector%

Generate an instance of x:vector% (or a subclass of x:vector%) with certain optional attributes, which are the same as x:list.

Examples:

> (define three-uint8s (x:vector uint8 3))
> (encode three-uint8s '#(1 2 3) #f)
#"\1\2\3"
> (encode three-uint8s (string->bytes/utf-8 "ABC") #f)
#"ABC"
> (encode three-uint8s '(1 2 3 4) #f)
encode: contract violation
  expected: sequence of 3 values
  given: 4
> (encode three-uint8s '(1000 2000 3000) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 1000
> (encode three-uint8s '(A B C) #f)
encode: contract violation
  expected: integer
  given: 'A
> (decode three-uint8s #"\1\2\3")
'#(1 2 3)
> (decode three-uint8s #"\1\2\3\4")
'#(1 2 3)
> (decode three-uint8s #"\1\2")
decode: contract violation
  expected: bytes for 3 items
  given: 2
> (define <256-uint8s (x:vector #:type uint8 #:length uint8))
> (encode <256-uint8s '(1 2 3) #f)
#"\3\1\2\3"
> (encode <256-uint8s (make-list 500 1) #f)
encode: contract violation
  expected: value that fits within unsigned 1-byte int (0 to
255)
  given: 500
> (vector->list (decode <256-uint8s #"\3\1\2\3"))
'(1 2 3)
> (for/list ([val (in-vector (decode <256-uint8s #"\3\1\2\3\4"))])
  val)
'(1 2 3)
> (vector->list (decode <256-uint8s #"\3\1\2"))
decode: contract violation
  expected: bytes for 3 items
  given: 2
> (define (doubler vec) (vector-append vec vec))
> (define quad-vec (x:vector uint16be
  #:pre-encode doubler
  #:post-decode doubler))
> (encode quad-vec '#(1 2 3) #f)
#"\0\1\0\2\0\3\0\1\0\2\0\3"
> (decode quad-vec #"\0\1\0\2\0\3\0\1\0\2\0\3")
'#(1 2 3 1 2 3 1 2 3 1 2 3)

5.7 Dicts🔗ℹ

(require xenomorph/dict)

package: xenomorph

A dict is a store of keys and values. The analogy to a Racket dict? is intentional, but in Xenomorph a dict must also be ordered, because a binary encoding doesn’t make sense if it happens in a different order every time. The more precise analogy would be to an association list — a thing that has both dict-like and list-like qualities — but this would be a laborious name.

class
x:dict% : class?

superclass: x:base%

Base class for dict formats. Use x:dict to conveniently instantiate new dict formats.

constructor
(new x:dict% [fields fields]) → (is-a?/c x:dict%)
  fields : dict?
Create class instance that represents a dict format with fields as a dictionary holding the key–value pairs that define the dict format. Each key must be a symbol? and each value must be a xenomorphic? type.
method
(send a-x:dict x:decode input-port parent) → hash-eq?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns a hasheq whose keys are the same as the keys in fields.
method
(send a-x:dict x:encode kvs
input-port
parent) → bytes?
  kvs : dict?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Take the keys and values in kvs and encode them as a byte string.

procedure
(x:dict? x) → boolean?
x : any/c

Whether x is an object of type x:dict%.

procedure
(x:dict [ #:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]
dict ...
key
val-type ...
...) → x:dict?
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:dict%)) = x:dict%
  dict : (listof (pairof symbol? xenomorphic?))
  key : symbol?
  val-type : xenomorphic?

Generate an instance of x:dict% (or a subclass of x:dict%) with certain optional attributes.

The rest arguments determine the keys and value types of the dict. These arguments can either be alternating keys and value-type arguments (similar to the calling pattern for hasheq) or association lists.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define d1 (x:dict 'foo uint8 'bar (x:string #:length 5)))
> (define d1-vals (hasheq 'foo 42 'bar "hello"))
> (encode d1 d1-vals #f)
#"*hello"
> (decode d1 #"*hello")
'#hasheq((bar . "hello") (foo . 42))
> (define d2 (x:dict 'zam (x:list #:length 3 #:type uint8)
'nested d1))
> (define d2-vals (hasheq 'zam '(42 43 44)
'nested d1-vals))
> (encode d2 d2-vals #f)
#"*+,*hello"
> (decode d2 #"*+,*hello")
'#hasheq((nested . #hasheq((bar . "hello") (foo . 42))) (zam . (42 43 44)))

5.8 Versioned dicts🔗ℹ

(require xenomorph/versioned-dict)

package: xenomorph

The versioned dict is a format derived from x:dict% that contains multiple possible dict encodings. It also carries a version field to select among them. This version is stored with the encoded data, of course, so on decode, the correct version will be chosen.

procedure
(version-type? x) → boolean?
x : any/c

Whether x can be used as the version type of a versioned dict. Valid types are integer?, procedure?, or xenomorphic?.

class
x:versioned-dict% : class?

superclass: x:dict%

Base class for versioned dict formats. Use x:versioned-dict to conveniently instantiate new dict formats.

constructor
(new x:versioned-dict%
    [type type]
    [versions versions]
    [version-key version-key]
    [fields fields])
→ (is-a?/c x:versioned-dict%)
  type : version-type?
  versions : dict?
  version-key : symbol?
  fields : #false
Create class instance that represents a versioned dict format with type as the encoded type of the version value, and versions as a dictionary holding the key–value pairs that define the versioned dict. Each key of versions must be a value consistent with type, and each value must either be a dict? or x:dict?.
method
(send a-x:versioned-dict x:decode input-port
parent) → hash-eq?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:dict%.
Returns a hasheq whose keys are the same as the keys in fields.
method
(send a-x:versioned-dict x:encode kvs
input-port
parent) → bytes?
  kvs : dict?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:dict%.
Take the keys and values in kvs and encode them as a byte string.

procedure
(x:versioned-dict? x) → boolean?
x : any/c

Whether x is an object of type x:versioned-dict%.

procedure
(x:versioned-dict type-arg
versions-arg
#:type type-kw
#:versions versions-kw
[ #:version-key version-key
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class])
→ x:versioned-dict?
  type-arg : (or/c version-type? #false)
  versions-arg : (or/c dict? #false)
  type-kw : (or/c version-type? #false)
  versions-kw : (or/c dict? #false)
  version-key : (or/c symbol? #false) = x:version-key
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
   base-class : (λ (c) (subclass? c x:versioned-dict%))
= x:versioned-dict%

Generate an instance of x:versioned-dict% (or a subclass of x:versioned-dict%) with certain optional attributes.

type-arg or type-kw (whichever is provided, though type-arg takes precedence) determines the type of the version value that is used to select from among available dicts.

versions-arg or versions-kw (whichever is provided, though versions-arg takes precedence) is a dictionary holding the key–value pairs that define the versioned dict. Each key of versions must be a value consistent with type, and each value must either be a dict? or x:dict?.

version-key identifies the key that should be treated as the version selector. By default, it’s a separate private key called x:version-key that exists independently of the data fields. But if one of the existing data fields should be treated as the version key, you can pass it as the version-key argument.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define d1 (x:dict 'foo uint8 'bar (x:string #:length 5)))
> (define d1-vals (hasheq 'foo 42 'bar "hello" 'my-version-key 'd1))
> (define d2 (x:dict 'zam (x:list #:length 3 #:type uint8)
                     'nested d1))
> (define d2-vals (hasheq 'zam '(42 43 44)
                          'nested d1-vals
                          'my-version-key 'd2))
> (define vdict (x:versioned-dict
                 #:type (x:symbol)
                 #:version-key 'my-version-key
                 #:versions (hash 'd1 d1 'd2 d2)))
> (encode vdict d1-vals #f)
#"d1\0*hello"
> (decode vdict #"d1\0*hello")
'#hasheq((bar . "hello") (foo . 42) (my-version-key . d1))
> (encode vdict d2-vals #f)
#"d2\0*+,*hello"
> (decode vdict #"d2\0*+,*hello")
'#hasheq((my-version-key . d2)
         (nested . #hasheq((bar . "hello") (foo . 42)))
         (zam . (42 43 44)))

5.8.1 Reserved values🔗ℹ

value
x:version-key : symbol? = 'x:version

Key used by default to store & look up the version-selector value within the fields of a versioned dict. When the version dict is created, a different key can be specified.

5.9 Pointers🔗ℹ

(require xenomorph/pointer)

package: xenomorph

A pointer can be thought of as a meta-object that can wrap any of the other binary formats here. It doesn’t change how they work: they still take the same inputs (on encode) and produce the same values (on decode).

What it does change is the underlying housekeeping, by creating a layer of indirection around the data.

On encode, instead of storing the raw data at a certain point in the byte stream, it creates a reference — that is, a pointer — to that data at another location, and then puts the data at that location.

On decode, the process is reversed: the pointer is dereferenced to discover the true location of the data, the data is read from that location, and then the decode proceeds as usual.

Under the hood, this housekeeping is fiddly and annoying. But good news! It’s already been done. Please do something worthwhile with the hours of your life that have been returned to you.

Pointers can be useful for making data types of different sizes behave as if they were the same size. For instance, Lists require all elements to have the same encoded size. What if you want to put different data types in the list? Wrap each item in a pointer, and you can make a list of pointers (because they have consistent size) that reference different kinds of data.

class
x:pointer% : class?

superclass: x:base%

Base class for pointer formats. Use x:pointer to conveniently instantiate new pointer formats.

procedure
(pointer-relative-value? x) → boolean?
x : any/c

Whether x can be used as a value for the pointer-relative-to field of x:pointer%. Valid choices are '(local immediate parent global).

constructor
(new x:pointer%
    [ptr-type ptr-type]
    [dest-type dest-type]
    [pointer-relative-to pointer-relative-to]
    [allow-null? allow-null?]
    [null-value null-value]
    [pointer-lazy? pointer-lazy?])
→ (is-a?/c x:pointer%)
  ptr-type : x:int?
  dest-type : (or/c xenomorphic? 'void)
  pointer-relative-to : pointer-relative-value?
  allow-null? : boolean?
  null-value : any/c
  pointer-lazy? : boolean?
Create class instance that represents a pointer format. See x:pointer for a description of the fields.
method
(send a-x:pointer x:decode input-port
parent) → any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns the dereferenced value of the pointer whose type is controlled by dest-type.
method
(send a-x:pointer x:encode val
input-port
parent) → bytes?
  val : any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Take a value of type dest-type, wrap it in a pointer, and encode it as a byte string.

procedure
(x:pointer? x) → boolean?
x : any/c

Whether x is an object of type x:pointer%.

procedure
(x:pointer [ ptr-type-arg
dest-type-arg
#:type ptr-type-kw
#:dest-type dest-type-kw
#:relative-to pointer-relative-to
#:allow-null allow-null?
#:null null-value
#:lazy pointer-lazy?
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:pointer?
  ptr-type-arg : (or/c x:int? #false) = #false
  dest-type-arg : (or/c xenomorphic? 'void #false) = #false
  ptr-type-kw : (or/c x:int? #false) = uint32
  dest-type-kw : (or/c xenomorphic? 'void #false) = uint8
  pointer-relative-to : pointer-relative-value? = 'local
  allow-null? : boolean? = #true
  null-value : any/c = 0
  pointer-lazy? : boolean? = #false
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:pointer%)) = x:pointer%

Generate an instance of x:pointer% (or a subclass of x:pointer%) with certain optional attributes.

ptr-type-arg or ptr-type-kw (whichever is provided, though ptr-type-arg takes precedence) controls the type of the pointer value itself, which must be an x:int?. Default is uint32.

dest-type-arg or dest-type-kw (whichever is provided, though dest-type-arg takes precedence) controls the type of the thing being pointed at, which must be a xenomorphic? object or the symbol 'void to indicate a void pointer. Default is uint8.

pointer-relative-to controls how the byte-offset value stored in the pointer is calculated. It must be one of '(local immediate parent global). Default is 'local.

allow-null? controls whether the pointer can take on null values, and null-value controls what that value is. Defaults are #true and 0, respectively.

pointer-lazy? controls whether the pointer is decoded immediately. If pointer-lazy? is #true, then the decoding of the pointer is wrapped in a promise that can later be evaluated with force. Default is #false.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

5.9.1 Private values🔗ℹ

value
x:start-offset-key : symbol? = 'x:start-offset
value
x:current-offset-key : symbol? = 'x:current-offset
value
x:parent-key : symbol? = 'x:parent
value
x:pointer-size-key : symbol? = 'x:ptr-size
value
x:pointers-key : symbol? = 'x:pointers
value
x:pointer-offset-key : symbol? = 'x:ptr-offset
value
x:pointer-type-key : symbol? = 'x:ptr-type
value
x:length-key : symbol? = 'x:length
value
x:val-key : symbol? = 'x:val

Private fields used for pointer housekeeping. There is no reason to mess with these.

5.10 Bitfields🔗ℹ

(require xenomorph/bitfield)

package: xenomorph

A bitfield is a compact encoding for Boolean values using an integer, where each bit of the integer indicates #true or #false (corresponding to a value of 1 or 0). The bitfield object creates a mapping between the keys of the bitfield (called flags) and the integer bits.

class
x:bitfield% : class?

superclass: x:base%

Base class for bitfield formats. Use x:bitfield to conveniently instantiate new bitfield formats.

constructor
(new x:bitfield%
    [type type]
    [flags flags]) → (is-a?/c x:bitfield%)
  type : x:int?
  flags : (listof (or/c symbol? #false))
Create class instance that represents a bitfield format. See x:bitfield for a description of the fields.
method
(send a-x:bitfield x:decode input-port
parent) → hash?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns a hash whose keys are the names of the flags, and whose values are Booleans.
method
(send a-x:bitfield x:encode flag-hash
input-port
parent) → bytes?
  flag-hash : hash?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Take a hash — where hash keys are the names of the flags, hash values are Booleans — and encode it as a byte string.

procedure
(x:bitfield? x) → boolean?
x : any/c

Whether x is an object of type x:bitfield%.

procedure
(x:bitfield [ type-arg]
flags-arg
[ #:type type-kw
#:flags flags-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:bitfield?
  type-arg : (or/c x:int? #false) = #false
  flags-arg : (listof any/c)
  type-kw : (or/c x:int? #false) = uint8
  flags-kw : (listof any/c) = null
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:bitfield%)) = x:bitfield%

Generate an instance of x:bitfield% (or a subclass of x:bitfield%) with certain optional attributes.

type-arg or type-kw (whichever is provided, though type-arg takes precedence) controls the type of the bitfield value itself, which must be an x:int?. Default is uint8.

flags-arg or flags-kw (whichever is provided, though flags-arg takes precedence) is a list of flag names corresponding to each bit. The number of names must be fewer than the number of bits in type. No name can be duplicated. Each flag name can be any value, but #false indicates a skipped bit. Default is null.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define flags (x:bitfield uint8 '(alpha bravo charlie delta echo)))
> (define vals (hasheq
                'alpha #true
                'charlie #true
                'echo #true))
> (encode flags vals #f)
#"\25"
> (decode flags #"\25")
'#hash((alpha . #t) (bravo . #f) (charlie . #t) (delta . #f) (echo . #t))

5.11 Enumerations🔗ℹ

(require xenomorph/enum)

package: xenomorph

An enumeration is a mapping of values to sequential integers.

class
x:enum% : class?

superclass: x:base%

Base class for list formats. Use x:enum to conveniently instantiate new enumeration formats.

constructor
(new x:enum%
    [type type]
    [values values]) → (is-a?/c x:enum%)
  type : x:int?
  values : (listof any/c)
Create class instance that represents an enumeration format of type type, sequentially mapped to values.
method
(send a-x:enum x:decode input-port parent) → any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns either the value associated with a certain integer, or if the value is #false or doesn’t exist, then the integer itself.
method
(send a-x:enum x:encode val
input-port
parent) → bytes?
  val : any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Take value listed in the values field and encode it as a byte string.

procedure
(x:enum? x) → boolean?
x : any/c

Whether x is an object of type x:enum%.

procedure
(x:enum [ type-arg
values-arg
#:type type-kw
#:values values-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:enum?
  type-arg : (or/c x:int? #false) = #false
  values-arg : (listof any/c) = #false
  type-kw : (or/c x:int? #false) = uint8
  values-kw : (listof any/c) = null
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:enum%)) = x:enum%

Generate an instance of x:enum% (or a subclass of x:enum%) with certain optional attributes.

type-arg or type-kw (whichever is provided, though type-arg takes precedence) determines the integer type for the enumeration. Default is uint8.

values-arg or values-kw (whichever is provided, though values-arg takes precedence) determines the mapping of values to integers, where each value corresponds to its index in the list. #false indicates skipped values. Default is null.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define e (x:enum #:type uint8
#:values '("foo" "bar" "baz" #f)))
> (encode e "baz" #f)
#"\2"
> (decode e #"\2")
"baz"
; corresponding enum value is #false, so we pass through input value
> (decode e #"\3")
3
; no corresponding enum value, so we pass through input value
> (decode e #"\5")
5

5.12 Reserved🔗ℹ

(require xenomorph/reserved)

package: xenomorph

The reserved object simply skips data. The advantage of using a reserved object rather than the type itself is a) it clearly signals that the data is being ignored, and b) it prevents writing to that part of the data structure.

class
x:reserved% : class?

superclass: x:base%

Base class for reserved formats. Use x:reserved to conveniently instantiate new reserved formats.

constructor
(new x:reserved%
    [type type]
    [count count]) → (is-a?/c x:reserved%)
  type : xenomorphic?
  count : exact-positive-integer?
Create class instance that represents an reserved format. See x:reserved for a description of the fields.
method
(send a-x:reserved x:decode input-port
parent) → void?
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:decode in x:base%.
Returns (void).
method
(send a-x:reserved x:encode val
input-port
parent) → bytes?
  val : any/c
  input-port : input-port?
  parent : (or/c xenomorphic? #false)
Extends x:encode in x:base%.
Encodes zeroes as a byte string that is the length of type.

procedure
(x:reserved? x) → boolean?
x : any/c

Whether x is an object of type x:reserved%.

procedure
(x:reserved [ type-arg]
count-arg
#:type type-kw
[ #:count count-kw
#:pre-encode pre-encode-proc
#:post-decode post-decode-proc
#:base-class base-class]) → x:reserved?
  type-arg : (or/c xenomorphic? #false) = #false
  count-arg : (or/c exact-positive-integer? #false)
  type-kw : (or/c xenomorphic? #false)
  count-kw : exact-positive-integer? = 1
  pre-encode-proc : (or/c (any/c . -> . any/c) #false) = #false
  post-decode-proc : (or/c (any/c . -> . any/c) #false) = #false
  base-class : (λ (c) (subclass? c x:reserved%)) = x:reserved%

Generate an instance of x:reserved% (or a subclass of x:reserved%) with certain optional attributes.

type-arg or type-kw (whichever is provided, though type-arg takes precedence) controls the type wrapped by the reserved object, which must be xenomorphic?.

count-arg or count-kw (whichever is provided, though count-arg takes precedence) is the number of items of type that should be skipped.

pre-encode-proc and post-decode-proc control the pre-encoding and post-decoding procedures, respectively. Each takes as input the value to be processed and returns a new value.

base-class controls the class used for instantiation of the new object.

Examples:

> (define res (x:reserved #:type uint32))
> (encode res 1 #f)
#"\0\0\0\0"
> (encode res 1234 #f)
#"\0\0\0\0"
> (encode res 12345678 #f)
#"\0\0\0\0"
> (void? (decode res #"\0\0\0\0"))
#t

5.13 Utilities🔗ℹ

(require xenomorph/util)

package: xenomorph

procedure
(length-resolvable? x) → boolean?
x : any/c

Whether x is something that can be used as a length argument with xenomorphic? objects that have length. For instance, an x:list or x:stream.

The following values are deemed to be resolvable: any exact-nonnegative-integer?, an x:int?, or any procedure? that takes one argument (= the parent object) returns a exact-nonnegative-integer?.

6 License & source code🔗ℹ

This module is licensed under the MIT license.

Source repository at http://git.matthewbutterick.com/mbutterick/typesetting. Suggestions & corrections welcome.

1	Installation
2	The big picture
3	Tutorials
4	Main interface
5	Core xenomorphic objects
6	License & source code