On this page:
<r16-require>
<r16-provide>
<r16-run>
<*>

2.18 Quantile cuts🔗ℹ

The "hist" tree method buckets each feature into quantile bins before training. dmatrix-quantile-cut exposes those cut points after a model has been built: it returns (values indptr data), a CSR-style pair where indptr gives each feature’s slice into the flat data vector of cut values. This example trains a one-round "hist" model and inspects its cuts.

(require ffi/vector
         xgboost)

(provide run-example)

(define (run-example)
  (define dtrain
    (make-dmatrix (f32vector 1.0 2.0  3.0 4.0  5.0 6.0  7.0 8.0)
                  #:nrow 4 #:ncol 2 #:missing -1.0
                  #:labels (f32vector 1.0 3.0 5.0 7.0)))
  (define booster
    (train dtrain #:objective "reg:squarederror"
           #:params '(("tree_method" . "hist"))
           #:max-depth 2 #:verbosity 0 #:rounds 1))
  (define-values (indptr data) (dmatrix-quantile-cut dtrain))
  (hash 'indptr indptr 'data data
        'indptr-length (length indptr)
        'data-length (f32vector-length data)
        'prediction-count (f32vector-length (predict booster dtrain #:as 'f32vector))))

The harness "test/16-quantile-cuts.rkt" prints the cut-vector lengths and asserts the CSR invariants (indptr starts at 0 and its last entry equals the data length).

<*> ::=