Manyspikes

In addition to ridge regression, there are two other regularisation techniques usually applied to linear regression: (i) Lasso, which stands for Least Absolute Shrinkage and Selection Operator, and (ii) Elastic net.

Both methods attempt to discourage large values for the weights of the model, but they differ in how they define the penalty applied in the loss function.

Lasso

Lasso attempts to keep the $\ell_1$ -norm of the weights vector as small as possible. The corresponding loss function is:

\begin{equation} L(\mathbf{w}) = (\mathbf{y} - \mathbf{X}\mathbf{w})^T (\mathbf{y} - \mathbf{X}\mathbf{w}) + \lambda||\mathbf{w}||_1, \end{equation}

where $||w||_1 = \sum_{i=1}^N |w_i|$ is the $\ell_1$ -norm of the vector $\mathbf{w}$ . Note that the loss is not longer continuous because of the absolute value in $||w||_1$ . Thus, a closed-form solution to the parameter estimates is generally not possible. Instead, we must rely on numerical optimisation in order to arrive at the Lasso estimates. In practice, this is often done using an optimisation algorithm called coordinate descent.

Elastic net

Elastic net imposes a combination of $\ell_1$ and $\ell_2$ penalties, as follows:

\begin{equation} L(\mathbf{w}) = (\mathbf{y} - \mathbf{X}\mathbf{w})^T (\mathbf{y} - \mathbf{X}\mathbf{w}) + \lambda_1||\mathbf{w}||_1 + \lambda_2||\mathbf{w}||_2^2, \end{equation}

Instead of a single regularisation parameter, we now have two parameters which control the extent to which we penalise the $\ell_1$ - and $\ell_2$ -norms. As with Lasso, we rely on numerical optimisation algorithms to estimate the parameters $\mathbf{w}$ that minimise the loss.

Lasso and elastic net

Lasso

Elastic net