A ridge regression is a least-squares regression that uses L2-regularization.
The regularized loss function on some dataset S is thus:
Lridge(S,→w)=LMSE(S,→w)+λ‖→w‖22where λ is an hyper-parameter that controls the importance of the regularization.
For a complete discussion about the effect of L2-regularization on the parameters of the model, check out our dedicated article: L2-regularization.
Common mistake: it is important to normalize the features before using regularization. Failure to do so will yield incoherent regularization behavior.
Analytical solution
Let Strain be the training-set and note X and →y the corresponding design matrix and output vector.
We can compute the value of the parameter vector that minimizes the regularized loss using differentiation.
∇L(Strain,→w)=−1|Strain|X⊤(→y−X→w) | |
∇λ‖→w‖2=2λ→w |
Setting all directional derivatives to 0, we get:
ˆ→wStrain=[X⊤X+(2λ|Strain|)I]−1X⊤→y