On this page:
<r27-require>
<r27-provide>
<r27-accuracy>
<r27-data>
<r27-fit>
<r27-run>
<*>

2.7 Get Started🔗ℹ

This is the Racket counterpart of XGBoost’s Python quickstart. The upstream snippet is:

from xgboost import XGBClassifier

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

data = load_iris()

X_train, X_test, y_train, y_test = train_test_split(

    data['data'], data['target'], test_size=.2)

bst = XGBClassifier(n_estimators=2, max_depth=2, learning_rate=1)

bst.fit(X_train, y_train)

preds = bst.predict(X_test)

The binding has no scikit-style XGBClassifier estimator, so instead of .fit/.predict we build a DMatrix and call train / predict the same flow the upstream R, Julia, and Scala quickstarts use. load-iris and train-test-split come from xgboost/private/demo-utils (load-iris downloads the UCI dataset, falling back to a bundled copy offline). Iris has three classes, so we use "multi:softmax" with #:num-class 3 then predict returns class indices, like sklearn’s .predict returns labels.

(require xgboost
         xgboost/private/demo-utils)

(provide run-example)

Accuracy. A small helper, since unlike sklearn we score by hand:

(define (accuracy preds labels)
  (/ (for/sum ([p (in-list preds)] [y (in-list labels)])
       (if (= (inexact->exact (round p)) y) 1 0))
     (length labels)))

Load and split. load-iris returns the 150 × 4 feature rows and integer labels; train-test-split holds out 20% with a fixed seed for reproducibility:

(define-values (X y) (load-iris))
(define-values (X-train X-test y-train y-test)
  (train-test-split X y #:test-size 0.2 #:seed 42))

Fit and predict. make-dmatrix accepts the row lists directly; two shallow rounds with #:eta 1.0 are plenty for iris:

(define bst
  (train (make-dmatrix X-train #:labels y-train)
         #:num-class 3
         #:objective "multi:softmax"
         #:max-depth 2
         #:eta 1.0
         #:verbosity 0
         #:rounds 2))
(define preds (predict bst (make-dmatrix X-test)))

run-example returns the training size, the predictions, and the test accuracy. The harness "test/27-get-started.rkt" prints a one-line summary and asserts iris is comfortably classified:

; get-started: 120 train / 30 test, test accuracy 0.967

(define (run-example)
  <r27-data>
  <r27-fit>
  (values (length X-train) preds (accuracy preds y-test)))

<*> ::=