On this page:
<r02-require>
<r02-provide>
<r02-data>
<r02-train>
<r02-predict>
<r02-run>
<*>

2.3 Binary classification🔗ℹ

Switching from regression to classification is a one-parameter change: the objective. This example fits "binary:logistic", whose predictions are probabilities in [0, 1] rather than raw targets. We threshold at 0.5 to recover a hard class and report accuracy.

The data is two well-separated clusters in 4-D space: class 0 rows are small-valued, class 1 rows are large-valued, interleaved so the labels alternate.

(require ffi/vector
         xgboost)

(provide run-example)

The data. Ten rows of four features, alternating class 0 / class 1:

(define features
  (f32vector 0.1 0.2 0.1 0.0
             5.0 4.0 5.5 6.0
             0.3 0.5 0.1 0.2
             4.5 5.0 4.0 5.5
             0.0 0.1 0.2 0.0
             6.0 5.5 6.5 5.0
             0.4 0.3 0.2 0.5
             5.5 6.0 4.5 5.0
             0.2 0.1 0.3 0.1
             4.0 4.5 5.0 4.0))
(define labels (f32vector 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0))
(define dtrain
  (make-dmatrix features #:nrow 10 #:ncol 4 #:labels labels))

Training. The only change from Training a regressor is #:objective, now "binary:logistic":

(define booster
  (train dtrain
         #:objective "binary:logistic"
         #:max-depth 3
         #:eta 0.3
         #:verbosity 0
         #:rounds 30))

Prediction. For "binary:logistic", predict returns the probability of class 1 for each row:

(define probs (predict booster dtrain #:as 'f32vector))

run-example returns the booster, the training matrix, and those probabilities. The harness "test/02-train-classifier.rkt" prints a truth/probability/prediction table, reports accuracy at the 0.5 threshold, and asserts the clusters are classified perfectly:

; i truth p(1) pred
; 0 0 0.0123 0
; 1 1 0.9881 1
; accuracy: 10/10 (100.0%)

(define (run-example)
  <r02-data>
  <r02-train>
  <r02-predict>
  (values booster dtrain probs))

<*> ::=