2.7 Get Started

Racket

2.7 Get Started🔗ℹ

This is the Racket counterpart of XGBoost’s Python quickstart. The upstream snippet is:

from xgboost import XGBClassifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

data = load_iris()

X_train, X_test, y_train, y_test = train_test_split(

data['data'], data['target'], test_size=.2)

bst = XGBClassifier(n_estimators=2, max_depth=2, learning_rate=1)

bst.fit(X_train, y_train)

preds = bst.predict(X_test)

The binding has no scikit-style XGBClassifier estimator, so instead of .fit/.predict we build a DMatrix and call train / predict — the same flow the upstream R, Julia, and Scala quickstarts use. load-iris and train-test-split come from xgboost/private/demo-utils (load-iris downloads the UCI dataset, falling back to a bundled copy offline). Iris has three classes, so we use "multi:softmax" with #:num-class 3 — then predict returns class indices, like sklearn’s .predict returns labels.

<r27-require> ::=

(require xgboost
xgboost/private/demo-utils)

<r27-provide> ::=

(provide run-example)

Accuracy. A small helper, since unlike sklearn we score by hand:

<r27-accuracy> ::=

(define (accuracy preds labels)
  (/ (for/sum ([p (in-list preds)] [y (in-list labels)])
       (if (= (inexact->exact (round p)) y) 1 0))
     (length labels)))

Load and split. load-iris returns the 150 × 4 feature rows and integer labels; train-test-split holds out 20% with a fixed seed for reproducibility:

<r27-data> ::=

(define-values (X y) (load-iris))
(define-values (X-train X-test y-train y-test)
(train-test-split X y #:test-size 0.2 #:seed 42))

Fit and predict. make-dmatrix accepts the row lists directly; two shallow rounds with #:eta 1.0 are plenty for iris:

<r27-fit> ::=

(define bst
  (train (make-dmatrix X-train #:labels y-train)
         #:num-class 3
         #:objective "multi:softmax"
         #:max-depth 2
         #:eta 1.0
         #:verbosity 0
         #:rounds 2))
(define preds (predict bst (make-dmatrix X-test)))

run-example returns the training size, the predictions, and the test accuracy. The harness "test/27-get-started.rkt" prints a one-line summary and asserts iris is comfortably classified:

; get-started: 120 train / 30 test, test accuracy 0.967

<r27-run> ::=

(define (run-example)
  <r27-data>
  <r27-fit>
  (values (length X-train) preds (accuracy preds y-test)))

<*> ::=

<r27-require>
<r27-provide>
<r27-accuracy>
<r27-run>

2.1	Building a DMatrix
2.2	Training a regressor
2.3	Binary classification
2.4	Multiclass classification
2.5	Watching an evaluation set
2.6	Iris: a full classification pipeline
2.7	Get Started
2.8	Robust regression
2.9	Quantile regression
2.10	Poisson count regression
2.11	Survival analysis (AFT)
2.12	Custom objective
2.13	Saving and loading models
2.14	Booster snapshots
2.15	DMatrix constructors
2.16	DMatrix metadata
2.17	Slicing and binary serialization
2.18	Quantile cuts
2.19	The high-level API end to end
2.20	Booster lifecycle and config
2.21	Booster attributes
2.22	Model dumps and feature importance
2.23	In-place prediction (dense)
2.24	In-place prediction (CSR)
2.25	In-place prediction (columnar)
2.26	Parameter recipes
2.27	Learning to rank
2.28	Global and process APIs
2.29	CUDA regression
2.30	CUDA classification