1 User Guide

9.2

Racket

1 User Guide🔗ℹ

This guide is a task-oriented tour of the high-level xgboost API. It follows the same arc as the Python package introduction — loading data, setting parameters, training, predicting, and inspecting a model — adapted to Racket. For the precise contract of every procedure, see the API Reference.

1.1 Installation🔗ℹ

Install from the Racket package catalog:

raco pkg install xgboost

The package ships a prebuilt native library and selects the right one for your platform at install time; on Linux it prefers a CUDA-enabled build when one is available and falls back to the CPU build otherwise. No XGBoost installation of your own is required.

Everything in this guide uses the default high-level module:

(require xgboost)

1.2 Getting Started🔗ℹ

This chapter trains a classifier on the classic iris dataset — load the data, split it, fit a booster, and predict. load-iris and train-test-split are demo helpers from xgboost/private/demo-utils; everything else is the high-level xgboost API.

(require xgboost xgboost/private/demo-utils)

; read data
(define-values (X y) (load-iris))
(define-values (X-train X-test y-train y-test)
  (train-test-split X y #:test-size 0.2))

; create model instance and fit
(define bst
  (train (make-dmatrix X-train #:labels y-train)
         #:num-class 3
         #:objective "multi:softmax"
         #:max-depth 2
         #:eta 1.0
         #:rounds 2))

; make predictions
(define preds (predict bst (make-dmatrix X-test)))

A complete, runnable, assertion-backed version is the Get Started example.

1.3 Data Interface🔗ℹ

XGBoost trains and predicts over a DMatrix: a dense or sparse feature matrix together with optional per-row metadata such as labels and weights. In the high-level API a DMatrix is an opaque Racket value whose native handle is reclaimed by the garbage collector; you never free it by hand.

1.3.1 Building a DMatrix from Racket data🔗ℹ

The most direct constructor is make-dmatrix. It accepts a list of row lists, a vector of row vectors, or a flat row-major sequence when you also pass #:nrow and #:ncol:

(require xgboost)

; nested rows — shape is inferred
(make-dmatrix '((1.0 2.0 0.5)
                (2.0 1.0 1.5)
                (3.0 0.5 0.0)))

; flat row-major data — shape supplied explicitly
(require ffi/vector)
(make-dmatrix (f32vector 1.0 2.0 3.0
                         4.0 5.0 6.0)
              #:nrow 2 #:ncol 3)

You can attach labels (and weights) at construction time:

(make-dmatrix '((1.0 2.0 0.5)
                (2.0 1.0 1.5)
                (3.0 0.5 0.0)
                (0.5 3.0 2.0))
              #:labels '(3.5 3.5 6.5 2.0))

1.3.2 Missing values🔗ℹ

XGBoost represents a missing entry with a sentinel value. By default that sentinel is +nan.0; pass #:missing to choose a different marker. Here -1.0 marks the holes, so the materialized matrix shows +nan.0 where the data had -1.0 (or, for sparse inputs, where no entry was supplied):

(make-dmatrix (f32vector 1.0 2.0 3.0
                         4.0 5.0 6.0)
              #:nrow 2 #:ncol 3
              #:missing -1.0)

1.3.3 Sparse and columnar inputs🔗ℹ

For data that is already in a native layout, build directly from CSR, CSC, or column-major storage. All three matrices below describe the same 2×3 data with two missing cells:

; CSR: row offsets, column indices, values, ncol
(make-dmatrix-from-csr (u64vector 0 2 4)
                       (u32vector 0 2 1 2)
                       (f32vector 1.0 3.0 5.0 6.0)
                       3
                       -1.0)

; CSC: column offsets, row indices, values, nrow
(make-dmatrix-from-csc (u64vector 0 1 2 4)
                       (u32vector 0 1 0 1)
                       (f32vector 1.0 5.0 3.0 6.0)
                       2
                       -1.0)

; Columnar: one f32vector per column
(make-dmatrix-from-columnar (list (f32vector 1.0 4.0)
                                  (f32vector 2.0 5.0)
                                  (f32vector 3.0 6.0))
                            -1.0)

1.3.4 Loading from a file🔗ℹ

make-dmatrix-from-uri reads an XGBoost-supported file or URI. Pass #:format to disambiguate when the extension is not enough:

(make-dmatrix-from-uri "train.libsvm" #:format "libsvm")

A DMatrix can also be written to XGBoost’s binary buffer format with dmatrix-save-binary! and reloaded later with make-dmatrix-from-uri.

1.3.5 Inspecting a DMatrix🔗ℹ

dmatrix-rows and dmatrix-cols report the shape, dmatrix->list materializes the contents as nested lists (missing cells appear as +nan.0), and dmatrix-show prints a human-readable rendering.

1.3.6 Labels, weights, and feature metadata🔗ℹ

Metadata can be set after construction. Setters accept lists, vectors, or typed vectors and coerce internally; getters read the fields back as Racket data:

(define dm
  (make-dmatrix (f32vector 1.0 2.0
                           3.0 4.0)
                #:nrow 2 #:ncol 2))

(dmatrix-set-label! dm '(0.25 0.75))
(dmatrix-set-feature-names! dm '("height" "weight"))
(dmatrix-set-feature-types! dm '("q" "q"))

(dmatrix-feature-names dm)  ; => '("height" "weight")
(dmatrix-label dm)          ; => '(0.25 0.75)

Other well-known fields have dedicated setters: dmatrix-set-weight!, dmatrix-set-base-margin!, dmatrix-set-group! (for ranking), and dmatrix-set-label-lower-bound! / dmatrix-set-label-upper-bound! (for AFT survival objectives). Each identifier above links to its full contract in the API Reference.

1.4 Setting Parameters🔗ℹ

XGBoost is configured by a set of string-keyed parameters — the objective, tree depth, learning rate, and so on. The Racket API lets you pass them in two complementary ways.

1.4.1 The #:params bundle🔗ℹ

#:params takes a hash or association list of parameters. Keys may be strings, symbols, or keywords; symbol and keyword keys are normalized to XGBoost’s underscore style (hyphens become underscores), and values are converted to strings before reaching the native layer:

(train dtrain
       #:params '((objective  . "reg:squarederror")
                  (max-depth  . 3)
                  (eta        . 0.1)
                  (tree-method . "hist")))

Here 'max-depth reaches XGBoost as "max_depth", and the numeric 3 becomes "3".

1.4.2 Keyword conveniences🔗ℹ

The most common parameters also have dedicated keywords on train: #:objective, #:eta, #:max-depth, #:num-class, #:eval-metric, and #:verbosity. They are applied after #:params, so a keyword overrides any same-named entry in the bundle:

(train dtrain
       #:objective "binary:logistic"
       #:max-depth 3
       #:eta 0.3
       #:verbosity 0
       #:rounds 30)

1.4.3 Evaluation sets🔗ℹ

To watch performance on held-out data during training, pass #:evals — a list of (cons name dmatrix) pairs. XGBoost reports each named metric per round, and the names you choose appear in the metric lines:

(train dtrain
       #:evals (list (cons "train" dtrain)
                     (cons "eval"  deval))
       #:objective "reg:squarederror"
       #:eval-metric "rmse"
       #:rounds 30)

Driving the evaluation loop yourself — to log metrics or stop early — is covered in Training.

1.4.4 Updating a booster’s parameters🔗ℹ

A parameter can also be set on an existing booster with booster-set-param!, which uses the same key/value coercion as #:params:

(booster-set-param! booster "tree_method" "hist")

1.5 Training🔗ℹ

With a DMatrix and parameters in hand, train fits a booster. The #:rounds keyword sets the number of boosting iterations; the result is an opaque booster value:

(require xgboost)

(define dtrain
  (make-dmatrix features #:nrow 8 #:ncol 3 #:labels labels))

(define booster
  (train dtrain
         #:objective "reg:squarederror"
         #:max-depth 3
         #:eta 0.1
         #:verbosity 0
         #:rounds 50))

The returned booster keeps the training DMatrix (and any #:evals DMatrices) reachable, so their native buffers stay alive for as long as the booster does.

1.5.1 Saving and loading models🔗ℹ

save-model writes the trained tree ensemble to a file; XGBoost picks the format from the extension (use ".json" or ".ubj"). load-model reads it back into a fresh booster:

(save-model booster "model.json")
(define reloaded (load-model "model.json"))

To keep the model in memory instead of on disk, serialize to bytes. save-model-to-bytes defaults to compact UBJSON; pass #:format "json" for the textual form. load-model-from-bytes reverses either:

(define blob (save-model-to-bytes booster)) ; UBJSON
(define json (save-model-to-bytes booster #:format "json"))
(define restored (load-model-from-bytes blob))

A model loaded by any of these routes produces the same predictions as the original booster.

save-model persists only the trained trees. To checkpoint mid-training and resume per-iteration updates in lockstep, use the full-state snapshots described in Global Configuration, Snapshots, and GPU.

1.5.2 Early stopping🔗ℹ

XGBoost’s Python package offers a built-in early-stopping callback. The Racket API has no callback mechanism; instead you drive the boosting loop yourself and decide when to stop. Build an untrained booster with #:rounds 0 (which still binds dtrain and the #:evals DMatrices into its cache), then step it one round at a time with booster-update-one-iter!. After each round, eval-one-iter returns XGBoost’s metric line, which parse-eval-line turns into a hash:

(define booster
  (train dtrain
         #:evals (list (cons "eval" deval))
         #:objective "reg:squarederror"
         #:max-depth 3
         #:eta 0.1
         #:verbosity 0
         #:rounds 0))

(define eval-set (list (cons "train" dtrain) (cons "eval" deval)))

(for ([iter (in-range 30)])
  (booster-update-one-iter! booster iter dtrain)
  (define metrics (parse-eval-line (eval-one-iter booster iter eval-set)))
  (printf "round ~a  eval-rmse ~a\n"
          iter (hash-ref metrics "eval-rmse")))

Because you own the loop, early stopping is just ordinary Racket control flow: track the best metric so far and break out (or stop updating) once it fails to improve for a chosen number of rounds. The booster after N calls to booster-update-one-iter! is exactly an N-round model, and booster-slice can later extract the best-iteration prefix if you trained past it.

1.6 Prediction🔗ℹ

predict runs a trained booster over a DMatrix. By default it returns a list of predictions; pass #:as 'f32vector to get the raw f32vector? copied from XGBoost instead:

(require xgboost)

(define preds (predict booster dtrain)) ; list of reals
(define vec (predict booster dtrain #:as 'f32vector)) ; f32vector

For a "binary:logistic" model the values are class-1 probabilities; for "multi:softprob" the output has nrow×num-class entries (row-major), while "multi:softmax" yields one predicted class index per row.

1.6.1 Choosing what to predict🔗ℹ

#:output selects the kind of output. 'value (the default) is the final prediction; 'margin is the raw score before the logistic or softmax transform; 'leaf gives per-tree leaf indices; and 'contribs / 'approx-contribs / 'interactions / 'approx-interactions return SHAP-style feature attributions:

(predict booster dtrain #:output 'margin)
(predict booster dtrain #:output 'contribs)

#:iteration-end limits prediction to the first N boosting rounds (0, the default, means all of them) — useful for evaluating an early-stopping prefix without re-slicing the booster.

1.6.2 In-place prediction🔗ℹ

For serving, you often have a single batch of raw features and no reason to build a DMatrix first. The predict-from-dense, predict-from-csr, and predict-from-columnar variants predict directly from Racket data and accept the same #:output, #:iteration-end, and #:as keywords as predict:

; dense rows, shaped like make-dmatrix's data argument
(predict-from-dense booster features
                    #:nrow 8 #:ncol 3
                    #:missing -1.0
                    #:as 'f32vector)

; CSR triple: indptr, indices, values, ncol
(predict-from-csr booster indptr indices values 3
                  #:missing -1.0)

; one f32vector per column
(predict-from-columnar booster
                       (list (f32vector 1.0 2.0 3.0)
                             (f32vector 2.0 1.0 0.5)
                             (f32vector 0.5 1.5 0.0))
                       #:missing -1.0)

In-place prediction returns the same numbers as building a DMatrix and calling predict; it just skips the intermediate allocation.

1.7 Learning to Rank🔗ℹ

This chapter is the Racket counterpart of XGBoost’s Learning to Rank tutorial. Ranking (LambdaMART) differs from regression and classification in one structural way: rows are partitioned into query groups, and the model learns to order the documents within each group rather than to predict an absolute target.

1.7.1 Declaring query groups🔗ℹ

A ranking DMatrix carries the usual features and per-row relevance #:labels, plus a group layout: how many consecutive rows belong to each query. Set it with dmatrix-set-group!, passing one row count per query (the rows must already be laid out query-by-query):

(require xgboost ffi/vector)

; 12 rows = three queries of four documents each
(define dranking
(make-dmatrix features #:nrow 12 #:ncol 3
#:labels '(0 1 2 3 0 1 2 3 0 1 2 3)))

(dmatrix-set-group! dranking '(4 4 4))

XGBoost stores the cumulative offsets internally; read them back with dmatrix-group-ptr, which returns '(0 4 8 12) for the example above (one more entry than the number of queries).

Relevance labels are typically small non-negative integers — graded relevance, where a larger label means a more relevant document.

1.7.2 Training a ranker🔗ℹ

Train with one of the "rank:*" objectives. "rank:ndcg" (LambdaMART optimizing nDCG) is the usual default; "rank:pairwise" optimizes the pairwise loss directly. Pass an "eval_metric" such as "ndcg@" N to watch ranking quality during training:

(define ranker
  (train dranking
         #:params '((objective   . "rank:ndcg")
                    (eval_metric . "ndcg@8"))
         #:max-depth 4 #:eta 0.1 #:verbosity 0 #:rounds 50))

"rank:map" optimizes mean average precision and therefore requires binary (0/1) relevance labels; graded labels raise an error. "rank:ndcg" and "rank:pairwise" accept graded relevance.

1.7.3 Predicting and scoring🔗ℹ

predict returns one score per row. The scores are only meaningful relative to other documents in the same query: sort each query’s documents by descending score to get the ranking.

(define scores (predict ranker dranking))

; rank the first query (rows 0–3) by descending score
(define query-0 (for/list ([i (in-range 4)]) (list-ref scores i)))

To measure quality, compute a ranking metric such as nDCG per query and average across queries. The Learning to rank example is a fully worked version — synthesizing graded-relevance queries, training "rank:ndcg", and asserting a high held-out nDCG.

1.8 Custom Objectives🔗ℹ

Beyond the built-in objectives named by strings (such as "reg:squarederror" or "binary:logistic"), you can supply your own loss by writing a Racket function that returns its gradient and Hessian.

1.8.1 The #:objective-fn keyword🔗ℹ

Pass #:objective-fn to train. Each round, train computes the current margin predictions and calls your function with those predictions and the training DMatrix; it must return (values grad hess) as sequences (lists, vectors, or f32vector? values), one entry per row:

(require xgboost ffi/vector)

; Squared error: grad = pred - label, hess = 1.
(define (squared-error preds dtrain)
  (define ys (dmatrix-label dtrain))
  (define n  (f32vector-length preds))
  (define grad (make-f32vector n))
  (define hess (make-f32vector n 1.0))
  (for ([i (in-range n)]
        [y (in-list ys)])
    (f32vector-set! grad i (- (f32vector-ref preds i) y)))
  (values grad hess))

(define booster
  (train dtrain
         #:objective-fn squared-error
         #:max-depth 3
         #:eta 0.2
         #:verbosity 0
         #:rounds 20))

The predictions passed to your function are raw margins, so the gradient and Hessian should be expressed with respect to the margin — exactly as a built-in objective would compute them internally.

1.8.2 Stepping a custom objective by hand🔗ℹ

For full control over the loop, use booster-train-one-iter!, which runs a single round from gradient and Hessian vectors you supply directly:

(for ([iter (in-range 20)])
  (define preds (predict booster iter-dtrain #:output 'margin #:as 'f32vector))
  (define-values (grad hess) (squared-error preds iter-dtrain))
  (booster-train-one-iter! booster iter iter-dtrain grad hess))

This is the building block #:objective-fn uses under the hood; reach for it when you need to interleave custom bookkeeping between rounds. The xgboost examples include several worked custom objectives (robust, quantile, Poisson, and AFT survival regression).

1.9 Parameter Recipes🔗ℹ

Several XGBoost features that have their own tutorials upstream are, from the API’s point of view, just parameter settings on an otherwise ordinary train call. This chapter collects them as short recipes. Each passes its settings through #:params (see Setting Parameters); the Parameter recipes example walks all four as a runnable, assertion-backed program.

1.9.1 DART booster🔗ℹ

DART drops a random subset of existing trees on each boosting round to regularize the ensemble. Select it with "booster" "dart" and tune the drop rates:

(train dtrain
       #:params '((booster   . "dart")
                  (objective . "reg:squarederror")
                  (rate_drop . 0.1)
                  (skip_drop . 0.5))
       #:max-depth 3 #:eta 0.1 #:rounds 30)

Prediction works exactly as for the default "gbtree" booster.

1.9.2 Monotonic constraints🔗ℹ

Monotonic constraints force the model’s response to be non-decreasing (1) or non-increasing (-1) in chosen features, with 0 for unconstrained ones. The constraint vector has one entry per feature:

; non-decreasing in feature 0, unconstrained in features 1 and 2
(train dtrain
       #:params '((objective            . "reg:squarederror")
                  (monotone_constraints . "(1,0,0)"))
       #:max-depth 3 #:eta 0.1 #:rounds 30)

The constraint is a hard guarantee: holding the other features fixed and sweeping feature 0 upward, the prediction never decreases. The example’s test verifies exactly this property.

1.9.3 Feature interaction constraints🔗ℹ

Interaction constraints restrict which features may appear together on a single root-to-leaf path. Pass groups as a JSON list of lists:

; features 0 and 1 may interact; feature 2 stays on its own
(train dtrain
       #:params '((objective                . "reg:squarederror")
                  (interaction_constraints . "[[0,1],[2]]"))
       #:max-depth 3 #:eta 0.1 #:rounds 30)

1.9.4 Random forests🔗ℹ

A random forest is the degenerate case of boosting: a single round (#:rounds 1) that grows many parallel, subsampled trees. Set "num_parallel_tree" and the subsampling ratios, and use a full-size learning rate:

(train dtrain
       #:params '((objective         . "reg:squarederror")
                  (num_parallel_tree . 20)
                  (subsample         . 0.8)
                  (colsample_bynode  . 0.8))
       #:max-depth 4 #:eta 1.0 #:rounds 1)

You can also combine a forest of "num_parallel_tree" trees with several boosting rounds to get a boosted forest.

1.10 Inspecting a Model🔗ℹ

The Python package’s Plotting section renders feature importance and tree diagrams with matplotlib and graphviz. The Racket bindings don’t ship a plotting layer; instead they expose the underlying data — model dumps and importance scores — that you can render with whatever tooling you prefer.

Future work: a native rendering built on the Racket plot library (feature-importance bar charts, and possibly tree diagrams) would be a natural companion to the booster-dump / booster-feature-score data shown here.

1.10.1 Dumping trees🔗ℹ

booster-dump returns one string per tree. #:format selects "text" (the default), "json", or "dot". The "dot" form is graphviz source you can pipe straight to dot to draw the tree:

(require xgboost)

(booster-dump booster) ; list of text trees
(booster-dump booster #:format "json")
(define dots (booster-dump booster #:format "dot"))

Passing #:feature-names (and #:feature-types) substitutes readable names into the dump in place of f0, f1, … :

(booster-dump booster
              #:format "text"
              #:feature-names '("x0" "x1" "x2")
              #:feature-types '("q" "q" "q"))

1.10.2 Feature importance🔗ℹ

booster-feature-score computes per-feature importance. Choose the mode with #:importance-type — "weight" (the default, split counts), "gain", "cover", or their totals. The result is a hash with 'features, 'scores (an f32vector?), and 'shape:

(define scores
  (booster-feature-score booster
                         #:importance-type "weight"
                         #:feature-names '("x0" "x1" "x2")))

(hash-ref scores 'features)   ; feature names in score order
(hash-ref scores 'scores)     ; f32vector of importances

To set readable feature names once so every dump and score uses them, call booster-set-feature-names! (and booster-set-feature-types!) on the booster after training.

1.10.3 Other inspection🔗ℹ

booster-num-feature and booster-boosted-rounds report the model’s width and how many rounds it has been trained for. booster-config returns XGBoost’s full configuration as a JSON string (treat it as opaque; round-trip it with booster-set-config!), and booster-attr / booster-set-attr! read and write arbitrary string attributes you want to travel with the model.

1.11 Global Configuration, Snapshots, and GPU🔗ℹ

1.11.1 Version and build information🔗ℹ

xgboost-version returns the linked XGBoost version string, and xgboost-build-info returns the build configuration as JSON (compiler flags, CUDA support, and so on):

(require xgboost)

(xgboost-version) ; e.g. "2.1.0"
(xgboost-build-info) ; JSON string

1.11.2 Process-global configuration🔗ℹ

XGBoost keeps some settings at process scope. Read and write them as JSON with xgboost-get-global-config and xgboost-set-global-config!. A common pattern is to save, change, then restore around a region of code:

(define saved (xgboost-get-global-config))
(dynamic-wind
  void
  (lambda ()
    (xgboost-set-global-config! "{\"verbosity\":0}")
    ; ... quiet work ...
    (void))
  (lambda ()
    (xgboost-set-global-config! saved)))

xgboost-register-log-callback! installs a process-global callback that receives XGBoost’s log messages as strings. Because it is process-global, treat it as shared mutable state.

1.11.3 Full-state snapshots🔗ℹ

save-model persists only the trained trees. A snapshot additionally captures XGBoost’s internal training caches, so a restored booster can resume per-iteration updates in lockstep with the original. booster->bytes serializes that full state and bytes->booster reconstructs it:

(train-rounds! booster dtrain 0 5) ; train 5 rounds

(define snapshot (booster->bytes booster))
(define restored (bytes->booster snapshot))

; both continue identically from round 5
(train-rounds! booster dtrain 5 5)
(train-rounds! restored dtrain 5 5)

A booster restored from a snapshot has an empty DMatrix cache, so pass the training data explicitly when you resume with booster-update-one-iter!.

1.11.4 GPU training🔗ℹ

When the native library is built with CUDA, training runs on the GPU by setting "device" to "cuda" (with "tree_method" "hist"). Check availability first with cuda-available? so code degrades gracefully on CPU-only builds:

(when (cuda-available?)
  (train dtrain
         #:params '((device      . "cuda")
                    (tree-method . "hist"))
         #:objective "reg:squarederror"
         #:rounds 50))

Whether cuda-available? is #t depends on the native build: the package prefers a CUDA-enabled library on Linux when one is present and falls back to the CPU build otherwise.

1.1	Installation
1.2	Getting Started
1.3	Data Interface
1.4	Setting Parameters
1.5	Training
1.6	Prediction
1.7	Learning to Rank
1.8	Custom Objectives
1.9	Parameter Recipes
1.10	Inspecting a Model
1.11	Global Configuration, Snapshots, and GPU