3 Data Interface
XGBoost trains and predicts over a DMatrix: a dense or sparse feature matrix together with optional per-row metadata such as labels and weights. In the high-level API a DMatrix is an opaque Racket value whose native handle is reclaimed by the garbage collector; you never free it by hand.
3.1 Building a DMatrix from Racket data
The most direct constructor is make-dmatrix. It accepts a list of row lists, a vector of row vectors, or a flat row-major sequence when you also pass #:nrow and #:ncol:
(require xgboost) ; nested rows — shape is inferred (make-dmatrix '((1.0 2.0 0.5) (2.0 1.0 1.5) (3.0 0.5 0.0))) ; flat row-major data — shape supplied explicitly (require ffi/vector) (make-dmatrix (f32vector 1.0 2.0 3.0 4.0 5.0 6.0) #:nrow 2 #:ncol 3)
You can attach labels (and weights) at construction time:
(make-dmatrix '((1.0 2.0 0.5) (2.0 1.0 1.5) (3.0 0.5 0.0) (0.5 3.0 2.0)) #:labels '(3.5 3.5 6.5 2.0))
3.2 Missing values
XGBoost represents a missing entry with a sentinel value. By default that sentinel is +nan.0; pass #:missing to choose a different marker. Here -1.0 marks the holes, so the materialized matrix shows +nan.0 where the data had -1.0 (or, for sparse inputs, where no entry was supplied):
(make-dmatrix (f32vector 1.0 2.0 3.0 4.0 5.0 6.0) #:nrow 2 #:ncol 3 #:missing -1.0)
3.3 Sparse and columnar inputs
For data that is already in a native layout, build directly from CSR, CSC, or column-major storage. All three matrices below describe the same 2×3 data with two missing cells:
; CSR: row offsets, column indices, values, ncol (make-dmatrix-from-csr (u64vector 0 2 4) (u32vector 0 2 1 2) (f32vector 1.0 3.0 5.0 6.0) 3 -1.0) ; CSC: column offsets, row indices, values, nrow (make-dmatrix-from-csc (u64vector 0 1 2 4) (u32vector 0 1 0 1) (f32vector 1.0 5.0 3.0 6.0) 2 -1.0) ; Columnar: one f32vector per column (make-dmatrix-from-columnar (list (f32vector 1.0 4.0) (f32vector 2.0 5.0) (f32vector 3.0 6.0)) -1.0)
3.4 Loading from a file
make-dmatrix-from-uri reads an XGBoost-supported file or URI. Pass #:format to disambiguate when the extension is not enough:
(make-dmatrix-from-uri "train.libsvm" #:format "libsvm")
A DMatrix can also be written to XGBoost’s binary buffer format with dmatrix-save-binary! and reloaded later with make-dmatrix-from-uri.
3.5 Inspecting a DMatrix
dmatrix-rows and dmatrix-cols report the shape, dmatrix->list materializes the contents as nested lists (missing cells appear as +nan.0), and dmatrix-show prints a human-readable rendering.
3.6 Labels, weights, and feature metadata
Metadata can be set after construction. Setters accept lists, vectors, or typed vectors and coerce internally; getters read the fields back as Racket data:
(define dm (make-dmatrix (f32vector 1.0 2.0 3.0 4.0) #:nrow 2 #:ncol 2)) (dmatrix-set-label! dm '(0.25 0.75)) (dmatrix-set-feature-names! dm '("height" "weight")) (dmatrix-set-feature-types! dm '("q" "q")) (dmatrix-feature-names dm) ; => '("height" "weight") (dmatrix-label dm) ; => '(0.25 0.75)
Other well-known fields have dedicated setters: dmatrix-set-weight!, dmatrix-set-base-margin!, dmatrix-set-group! (for ranking), and dmatrix-set-label-lower-bound! / dmatrix-set-label-upper-bound! (for AFT survival objectives). Each identifier above links to its full contract in the xgboost API reference.