5 Training

9.2

Racket

5 Training🔗ℹ

With a DMatrix and parameters in hand, train fits a booster. The #:rounds keyword sets the number of boosting iterations; the result is an opaque booster value:

(require xgboost)

(define dtrain
  (make-dmatrix features #:nrow 8 #:ncol 3 #:labels labels))

(define booster
  (train dtrain
         #:objective "reg:squarederror"
         #:max-depth 3
         #:eta 0.1
         #:verbosity 0
         #:rounds 50))

The returned booster keeps the training DMatrix (and any #:evals DMatrices) reachable, so their native buffers stay alive for as long as the booster does.

5.1 Saving and loading models🔗ℹ

save-model writes the trained tree ensemble to a file; XGBoost picks the format from the extension (use ".json" or ".ubj"). load-model reads it back into a fresh booster:

(save-model booster "model.json")
(define reloaded (load-model "model.json"))

To keep the model in memory instead of on disk, serialize to bytes. save-model-to-bytes defaults to compact UBJSON; pass #:format "json" for the textual form. load-model-from-bytes reverses either:

(define blob (save-model-to-bytes booster)) ; UBJSON
(define json (save-model-to-bytes booster #:format "json"))
(define restored (load-model-from-bytes blob))

A model loaded by any of these routes produces the same predictions as the original booster.

save-model persists only the trained trees. To checkpoint mid-training and resume per-iteration updates in lockstep, use the full-state snapshots described in Global Configuration, Snapshots, and GPU.

5.2 Early stopping🔗ℹ

XGBoost’s Python package offers a built-in early-stopping callback. The Racket API has no callback mechanism; instead you drive the boosting loop yourself and decide when to stop. Build an untrained booster with #:rounds 0 (which still binds dtrain and the #:evals DMatrices into its cache), then step it one round at a time with booster-update-one-iter!. After each round, eval-one-iter returns XGBoost’s metric line, which parse-eval-line turns into a hash:

(define booster
  (train dtrain
         #:evals (list (cons "eval" deval))
         #:objective "reg:squarederror"
         #:max-depth 3
         #:eta 0.1
         #:verbosity 0
         #:rounds 0))

(define eval-set (list (cons "train" dtrain) (cons "eval" deval)))

(for ([iter (in-range 30)])
  (booster-update-one-iter! booster iter dtrain)
  (define metrics (parse-eval-line (eval-one-iter booster iter eval-set)))
  (printf "round ~a  eval-rmse ~a\n"
          iter (hash-ref metrics "eval-rmse")))

Because you own the loop, early stopping is just ordinary Racket control flow: track the best metric so far and break out (or stop updating) once it fails to improve for a chosen number of rounds. The booster after N calls to booster-update-one-iter! is exactly an N-round model, and booster-slice can later extract the best-iteration prefix if you trained past it.

1	Installation
2	A complete example
3	Data Interface
4	Setting Parameters
5	Training
6	Prediction
7	Custom Objectives
8	Inspecting a Model
9	Global Configuration, Snapshots, and GPU