2.2 Training a regressor
With a DMatrix in hand, training is one call. This example fits a gradient-boosted regression model end to end: build a small synthetic dataset, train a booster, predict on the same rows, and compare the predictions to the labels.
The labels were chosen to follow roughly 2·x₀ + x₁ − x₂ with a touch of
noise —
(require ffi/vector xgboost)
(provide run-example)
The data. Eight rows of three features, with the regression target in labels:
(define features (f32vector 1.0 2.0 0.5 2.0 1.0 1.5 3.0 0.5 0.0 0.5 3.0 2.0 4.0 2.0 1.0 1.5 1.5 0.5 2.5 3.5 1.5 0.0 1.0 0.0)) (define labels (f32vector 3.5 3.5 6.5 2.0 9.0 4.0 7.0 1.0)) (define dtrain (make-dmatrix features #:nrow 8 #:ncol 3 #:labels labels))
Training. train runs the boosting loop. The objective "reg:squarederror" is ordinary least-squares regression; #:eta is the learning rate and #:max-depth caps each tree. We keep #:verbosity at 0 so the run is quiet:
(define booster (train dtrain #:objective "reg:squarederror" #:max-depth 3 #:eta 0.1 #:verbosity 0 #:rounds 50))
Prediction. predict with #:as 'f32vector returns the per-row predictions as an f32vector:
run-example returns the booster, the training matrix, and those predictions. The companion harness "test/01-train-regression.rkt" prints a predictions-vs-labels table, reports the training MSE, and its test submodule asserts the model fits (run with raco test). After 50 rounds the predictions track the labels closely:
; i label pred ; 0 3.5000 3.5012 ; ... ; training MSE ≈ 0.000…
(define (run-example) <r01-data> <r01-train> <r01-predict> (values booster dtrain preds))