On this page:
3.1 Building a DMatrix from Racket data
3.2 Missing values
3.3 Sparse and columnar inputs
3.4 Loading from a file
3.5 Inspecting a DMatrix
3.6 Labels, weights, and feature metadata
9.2

3 Data Interface🔗ℹ

XGBoost trains and predicts over a DMatrix: a dense or sparse feature matrix together with optional per-row metadata such as labels and weights. In the high-level API a DMatrix is an opaque Racket value whose native handle is reclaimed by the garbage collector; you never free it by hand.

3.1 Building a DMatrix from Racket data🔗ℹ

The most direct constructor is make-dmatrix. It accepts a list of row lists, a vector of row vectors, or a flat row-major sequence when you also pass #:nrow and #:ncol:

(require xgboost)
 
; nested rows — shape is inferred
(make-dmatrix '((1.0 2.0 0.5)
                (2.0 1.0 1.5)
                (3.0 0.5 0.0)))
 
; flat row-major data — shape supplied explicitly
(require ffi/vector)
(make-dmatrix (f32vector 1.0 2.0 3.0
                         4.0 5.0 6.0)
              #:nrow 2 #:ncol 3)

You can attach labels (and weights) at construction time:

(make-dmatrix '((1.0 2.0 0.5)
                (2.0 1.0 1.5)
                (3.0 0.5 0.0)
                (0.5 3.0 2.0))
              #:labels '(3.5 3.5 6.5 2.0))

3.2 Missing values🔗ℹ

XGBoost represents a missing entry with a sentinel value. By default that sentinel is +nan.0; pass #:missing to choose a different marker. Here -1.0 marks the holes, so the materialized matrix shows +nan.0 where the data had -1.0 (or, for sparse inputs, where no entry was supplied):

(make-dmatrix (f32vector 1.0 2.0 3.0
                         4.0 5.0 6.0)
              #:nrow 2 #:ncol 3
              #:missing -1.0)

3.3 Sparse and columnar inputs🔗ℹ

For data that is already in a native layout, build directly from CSR, CSC, or column-major storage. All three matrices below describe the same 2×3 data with two missing cells:

; CSR: row offsets, column indices, values, ncol
(make-dmatrix-from-csr (u64vector 0 2 4)
                       (u32vector 0 2 1 2)
                       (f32vector 1.0 3.0 5.0 6.0)
                       3
                       -1.0)
 
; CSC: column offsets, row indices, values, nrow
(make-dmatrix-from-csc (u64vector 0 1 2 4)
                       (u32vector 0 1 0 1)
                       (f32vector 1.0 5.0 3.0 6.0)
                       2
                       -1.0)
 
; Columnar: one f32vector per column
(make-dmatrix-from-columnar (list (f32vector 1.0 4.0)
                                  (f32vector 2.0 5.0)
                                  (f32vector 3.0 6.0))
                            -1.0)

3.4 Loading from a file🔗ℹ

make-dmatrix-from-uri reads an XGBoost-supported file or URI. Pass #:format to disambiguate when the extension is not enough:

(make-dmatrix-from-uri "train.libsvm" #:format "libsvm")

A DMatrix can also be written to XGBoost’s binary buffer format with dmatrix-save-binary! and reloaded later with make-dmatrix-from-uri.

3.5 Inspecting a DMatrix🔗ℹ

dmatrix-rows and dmatrix-cols report the shape, dmatrix->list materializes the contents as nested lists (missing cells appear as +nan.0), and dmatrix-show prints a human-readable rendering.

3.6 Labels, weights, and feature metadata🔗ℹ

Metadata can be set after construction. Setters accept lists, vectors, or typed vectors and coerce internally; getters read the fields back as Racket data:

(define dm
  (make-dmatrix (f32vector 1.0 2.0
                           3.0 4.0)
                #:nrow 2 #:ncol 2))
 
(dmatrix-set-label! dm '(0.25 0.75))
(dmatrix-set-feature-names! dm '("height" "weight"))
(dmatrix-set-feature-types! dm '("q" "q"))
 
(dmatrix-feature-names dm)  ; => '("height" "weight")
(dmatrix-label dm)          ; => '(0.25 0.75)

Other well-known fields have dedicated setters: dmatrix-set-weight!, dmatrix-set-base-margin!, dmatrix-set-group! (for ranking), and dmatrix-set-label-lower-bound! / dmatrix-set-label-upper-bound! (for AFT survival objectives). Each identifier above links to its full contract in the xgboost API reference.