3.4 Elastic net (0 < α

Racket

3.4 Elastic net (0 < α < 1)🔗ℹ

The elastic net blends the two penalties: with 0 < α < 1 the objective mixes the lasso’s L1 term and the ridge’s L2 term. It keeps the lasso’s ability to select variables while borrowing the ridge’s stability — in particular it tends to keep or drop groups of correlated predictors together, where the pure lasso would arbitrarily pick one. Setting α = 0 recovers Ridge regression (L2, α = 0) exactly and α = 1 recovers Lasso (L1, α = 1); the interesting models live in between.

On the running fixture, an α = 0.5 fit at a moderate λ both shrinks its coefficients (ridge-like) and drives at least one exactly to zero (lasso-like). At a shared λ, the number of coefficients it zeros sits between ridge (which zeros none) and lasso (which zeros the most) — the characteristic in-between behaviour of the blend.

<require> ::=

(require glmnet)

<provide> ::=

(provide run-example)

<data> ::=

(define X '((1.0 2.0  1.0)
            (2.0 1.0  4.0)
            (3.0 4.0  9.0)
            (4.0 3.0 16.0)
            (5.0 6.0 25.0)
            (6.0 5.0 36.0)))
(define y '(1.0 4.0 3.0 6.0 5.0 8.0))

elastic-net takes an explicit #:alpha. Compare its fit to the ridge and lasso fits at the same λ to see it land between them.

<fit> ::=

(define result (elastic-net X y #:alpha 0.5 #:lambda 0.5))

<run-example> ::=

(define (run-example)
  <data>
  <fit>
  result)

<*> ::=

<require>
<provide>
<run-example>

1	Getting started
2	User guide
3	Examples
4	API reference

3.1	Ordinary least squares (λ = 0)
3.2	Ridge regression (L2, α = 0)
3.3	Lasso (L1, α = 1)
3.4	Elastic net (0 < α < 1)
3.5	Binomial logistic regression (classification)
3.6	Multinomial classification (K classes)
3.7	Cox proportional hazards (survival)
3.8	Poisson regression (counts)
3.9	Multi-response Gaussian (grouped)