2.7 Get Started
This is the Racket counterpart of XGBoost’s Python quickstart. The upstream snippet is:
from xgboost import XGBClassifier |
from sklearn.datasets import load_iris |
from sklearn.model_selection import train_test_split |
data = load_iris() |
X_train, X_test, y_train, y_test = train_test_split( |
data['data'], data['target'], test_size=.2) |
bst = XGBClassifier(n_estimators=2, max_depth=2, learning_rate=1) |
bst.fit(X_train, y_train) |
preds = bst.predict(X_test) |
The binding has no scikit-style XGBClassifier estimator, so instead of
.fit/.predict we build a DMatrix and call train /
predict —
(require xgboost xgboost/private/demo-utils)
(provide run-example)
Accuracy. A small helper, since unlike sklearn we score by hand:
(define (accuracy preds labels) (/ (for/sum ([p (in-list preds)] [y (in-list labels)]) (if (= (inexact->exact (round p)) y) 1 0)) (length labels)))
Load and split. load-iris returns the 150 × 4 feature rows and integer labels; train-test-split holds out 20% with a fixed seed for reproducibility:
(define-values (X y) (load-iris)) (define-values (X-train X-test y-train y-test) (train-test-split X y #:test-size 0.2 #:seed 42))
Fit and predict. make-dmatrix accepts the row lists directly; two shallow rounds with #:eta 1.0 are plenty for iris:
(define bst (train (make-dmatrix X-train #:labels y-train) #:num-class 3 #:objective "multi:softmax" #:max-depth 2 #:eta 1.0 #:verbosity 0 #:rounds 2)) (define preds (predict bst (make-dmatrix X-test)))
run-example returns the training size, the predictions, and the test accuracy. The harness "test/27-get-started.rkt" prints a one-line summary and asserts iris is comfortably classified:
; get-started: 120 train / 30 test, test accuracy 0.967
(define (run-example) <r27-data> <r27-fit> (values (length X-train) preds (accuracy preds y-test)))