3.6 Multinomial classification (K classes)
The multinomial family generalizes the binomial classifier to K > 2 classes. It is the same glmnet lognet solver with the number of classes set above 1, so the elastic-net knobs (α, λ) are unchanged: the response is now an integer class label in {0, …, K−1} and the fit returns K intercepts and K coefficient vectors. Class probabilities are the softmax over the K linear predictors ηk = a0k + x·βk, and the predicted class is the argmax.
The synthetic data below is three separable clusters: class 0 at low x₁, class 1 at high x₁, and class 2 at high x₂. A lasso-penalized fit gives each class a sparse coefficient vector that picks out its discriminating feature, and the argmax classifies every training point correctly.
(require glmnet)
(provide run-example)
(define X '((1.0 1.0) (2.0 1.0) (1.0 2.0) (2.0 2.0) (5.0 1.0) (6.0 1.0) (5.0 2.0) (6.0 2.0) (3.0 5.0) (4.0 5.0) (3.0 6.0) (4.0 6.0))) (define y '(0 0 0 0 1 1 1 1 2 2 2 2))
multinomial-fit takes integer class labels and returns a multinomial-result whose multinomial-result-coefficients is a vector of K per-class coefficient vectors. Here each class keeps only the feature that distinguishes it (class 0 and 1 split on x₁, class 2 on x₂); the others come back exactly 0.0.
(define result (multinomial-fit X y #:lambda 0.01))
multinomial-predict-proba gives the per-class softmax probabilities for each row (they sum to 1), and multinomial-predict takes the argmax:
(multinomial-predict-proba result X) ; => per-row probabilities, each summing to 1 (multinomial-predict result X) ; => (0 0 0 0 1 1 1 1 2 2 2 2) -- every training point correct