2.2 Training a regressor

Racket

2.2 Training a regressor🔗ℹ

With a DMatrix in hand, training is one call. This example fits a gradient-boosted regression model end to end: build a small synthetic dataset, train a booster, predict on the same rows, and compare the predictions to the labels.

The labels were chosen to follow roughly 2·x₀ + x₁ − x₂ with a touch of noise — a relationship a tree booster fits easily.

<r01-require> ::=

(require ffi/vector
xgboost)

<r01-provide> ::=

(provide run-example)

The data. Eight rows of three features, with the regression target in labels:

<r01-data> ::=

(define features
  (f32vector 1.0 2.0 0.5
             2.0 1.0 1.5
             3.0 0.5 0.0
             0.5 3.0 2.0
             4.0 2.0 1.0
             1.5 1.5 0.5
             2.5 3.5 1.5
             0.0 1.0 0.0))
(define labels (f32vector 3.5 3.5 6.5 2.0 9.0 4.0 7.0 1.0))
(define dtrain
  (make-dmatrix features #:nrow 8 #:ncol 3 #:labels labels))

Training. train runs the boosting loop. The objective "reg:squarederror" is ordinary least-squares regression; #:eta is the learning rate and #:max-depth caps each tree. We keep #:verbosity at 0 so the run is quiet:

<r01-train> ::=

(define booster
  (train dtrain
         #:objective "reg:squarederror"
         #:max-depth 3
         #:eta 0.1
         #:verbosity 0
         #:rounds 50))

Prediction. predict with #:as 'f32vector returns the per-row predictions as an f32vector:

<r01-predict> ::=

(define preds (predict booster dtrain #:as 'f32vector))

run-example returns the booster, the training matrix, and those predictions. The companion harness "test/01-train-regression.rkt" prints a predictions-vs-labels table, reports the training MSE, and its test submodule asserts the model fits (run with raco test). After 50 rounds the predictions track the labels closely:

; i label pred
; 0 3.5000 3.5012
; ...
; training MSE ≈ 0.000…

<r01-run> ::=

(define (run-example)
  <r01-data>
  <r01-train>
  <r01-predict>
  (values booster dtrain preds))

<*> ::=

<r01-require>
<r01-provide>
<r01-run>

2.1	Building a DMatrix
2.2	Training a regressor
2.3	Binary classification
2.4	Multiclass classification
2.5	Watching an evaluation set
2.6	Iris: a full classification pipeline
2.7	Get Started
2.8	Robust regression
2.9	Quantile regression
2.10	Poisson count regression
2.11	Survival analysis (AFT)
2.12	Custom objective
2.13	Saving and loading models
2.14	Booster snapshots
2.15	DMatrix constructors
2.16	DMatrix metadata
2.17	Slicing and binary serialization
2.18	Quantile cuts
2.19	The high-level API end to end
2.20	Booster lifecycle and config
2.21	Booster attributes
2.22	Model dumps and feature importance
2.23	In-place prediction (dense)
2.24	In-place prediction (CSR)
2.25	In-place prediction (columnar)
2.26	Parameter recipes
2.27	Learning to rank
2.28	Global and process APIs
2.29	CUDA regression
2.30	CUDA classification