On this page:
<require>
<provide>
<data>
<fit>
<run-example>
<*>

3.2 Ridge regression (L2, α = 0)🔗

Ridge regression is the elastic net at α = 0: a pure L2 penalty. Unlike the lasso it never sets a coefficient exactly to zero — it shrinks every coefficient smoothly toward zero, more so as λ grows, until at very large λ they all vanish and only the intercept (the mean of the response) remains.

We reuse the OLS fixture but add an irrelevant third predictor x₃ = x₁²: the response is still exactly y = 1 + 2x₁ − x₂, so x₃ carries no signal. Ordinary least squares gives it coefficient 0; ridge instead keeps every coefficient nonzero but shrunken — including a small value on the irrelevant predictor. That is the characteristic ridge behaviour: shrink, never select.

(require glmnet)

(provide run-example)

<data> ::=
(define X '((1.0 2.0  1.0)
            (2.0 1.0  4.0)
            (3.0 4.0  9.0)
            (4.0 3.0 16.0)
            (5.0 6.0 25.0)
            (6.0 5.0 36.0)))
(define y '(1.0 4.0 3.0 6.0 5.0 8.0))

ridge is elnet-fit with #:alpha 0.0. With a modest λ the leading coefficient shrinks from its OLS value of 2.0 while staying nonzero.

<fit> ::=

(define result (ridge X y #:lambda 0.1))

(define (run-example)
  <data>
  <fit>
  result)

<*> ::=