Manyspikes

As we mentioned previously, it is often important to regularise a logistic regression model. This ensures that, if classes are linearly separable, the weights don't grow indefinitely.

Fortunately, applying regularisation to logistic regression models is fairly straightforward: by adding a penalty term the loss, we also add one term to gradient and to the Hessian. For instance, in the binary logistic regression case, the loss, gradient and Hessian under ridge regularisation become:

\begin{align} L(\mathbf{w}) = - \sum_{i=1}^m \left[ y_i\log \left(\sigma(\mathbf{w}^T\mathbf{x_i})\right) + \left(1-y_i\right)\log\left( 1-\sigma(\mathbf{w}^T\mathbf{x_i}) \right) \right] + \lambda\mathbf{w}^T\mathbf{w}\\[3 ex] \end{align}

\begin{align} \nabla_w L(\mathbf{w}) = \mathbf{X}^T (\sigma(\mathbf{X}\mathbf{w}) - \mathbf{y}) + 2\lambda\mathbf{w}\\[3 ex] \end{align}

\begin{align} \mathbf{H}(\mathbf{w}) = \mathbf{X}^T \text{diag}(\sigma(\mathbf{X}\mathbf{w})(1 - \sigma(\mathbf{X}\mathbf{w}))) \mathbf{X} + 2\lambda\mathbf{I}\\[3 ex] \end{align}

As with linear regression, we have typically three forms of regularisation to pick from: Ridge, Lasso and Elastic. These are commonly available as regularisation options in common ML libraries such as scikit-learn.

Regularisation