3.6 Multinomial classification (K classes)

Racket

3.6 Multinomial classification (K classes)🔗ℹ

The multinomial family generalizes the binomial classifier to K > 2 classes. It is the same glmnet lognet solver with the number of classes set above 1, so the elastic-net knobs (α, λ) are unchanged: the response is now an integer class label in {0, …, K−1} and the fit returns K intercepts and K coefficient vectors. Class probabilities are the softmax over the K linear predictors ηk = a0k + x·βk, and the predicted class is the argmax.

The synthetic data below is three separable clusters: class 0 at low x₁, class 1 at high x₁, and class 2 at high x₂. A lasso-penalized fit gives each class a sparse coefficient vector that picks out its discriminating feature, and the argmax classifies every training point correctly.

<require> ::=

(require glmnet)

<provide> ::=

(provide run-example)

<data> ::=

(define X '((1.0 1.0) (2.0 1.0) (1.0 2.0) (2.0 2.0)
(5.0 1.0) (6.0 1.0) (5.0 2.0) (6.0 2.0)
(3.0 5.0) (4.0 5.0) (3.0 6.0) (4.0 6.0)))
(define y '(0 0 0 0 1 1 1 1 2 2 2 2))

multinomial-fit takes integer class labels and returns a multinomial-result whose multinomial-result-coefficients is a vector of K per-class coefficient vectors. Here each class keeps only the feature that distinguishes it (class 0 and 1 split on x₁, class 2 on x₂); the others come back exactly 0.0.

<fit> ::=

(define result (multinomial-fit X y #:lambda 0.01))

multinomial-predict-proba gives the per-class softmax probabilities for each row (they sum to 1), and multinomial-predict takes the argmax:

(multinomial-predict-proba result X)
; => per-row probabilities, each summing to 1
(multinomial-predict result X)
; => (0 0 0 0 1 1 1 1 2 2 2 2) -- every training point correct

<run-example> ::=

(define (run-example)
  <data>
  <fit>
  result)

<*> ::=

<require>
<provide>
<run-example>

1	Getting started
2	User guide
3	Examples
4	API reference

3.1	Ordinary least squares (λ = 0)
3.2	Ridge regression (L2, α = 0)
3.3	Lasso (L1, α = 1)
3.4	Elastic net (0 < α < 1)
3.5	Binomial logistic regression (classification)
3.6	Multinomial classification (K classes)
3.7	Cox proportional hazards (survival)
3.8	Poisson regression (counts)
3.9	Multi-response Gaussian (grouped)