Manyspikes

Over the last few sections, we have looked at how the parameters of probability distributions affect their overall properties. In this section, we are going to introduce a generic approach to estimate the parameters for a given distribution.

The basic idea is as follows. Suppose we have some data $X$ that you want to model using a specific probabily distribution (e.g. multivariate normal) with parameters $\mathbf{\theta}$ . Our goal is to choose the parameters so that the data is a likely as possible under the resulting distribution. This is called Maximum Likelihood Estimation (MLE).

Let's walk through a simple example. Imagine that your dataset is a collection of indoor temperature measurements (Celsius) in a supermarket. Assuming that a normal distribution is an appropriate choice, let's look at how suitable different parameters are. Let's start with $\mu=0$ and $\sigma=0$ : how likely is it that a normal distribution with $\mu=0$ and $\sigma=1$ can generate the temperature data we observed? This is obviously not very likely, since such a normal distribution has most of its probability density between -1 and 1, which would be a very low value for an indoor temperature. Now consider another normal distribution with $\mu=18$ and $\sigma=1$ : this distribution looks a lot more likely, since most of its probability density would be in the interval $[17, 19]$ , which is a much more likely value for indoor temperatures. The latter choice of parameters makes the data much more likely.

What we want to know is: what are the parameters that maximise the likelhood of the data? There are different ways to estimate parameters using MLE: in some cases, estimating parameters via MLE is straightforward; in other cases, it may require iterative optimisation methods such as gradient descent. In any case, the departure point for Maximum Likelihood Estimation can be written as:

\begin{equation} \hat{\boldsymbol\theta}_{\text{MLE}} = \underset{\boldsymbol\theta}{\arg\max} \, P(X \mid \boldsymbol\theta) \end{equation}

MLE for a multivariate normal distribution

Now let's look at simple example of MLE. We won't cover the proof here, but it turns out that the MLE parameters for a multivariate normal distribution can be computed in closed form given the data: the empirical mean and the empirical covariance matrix are the maximum likelihood estimates. Let's see this in action using the Iris dataset once again:

Now let's plot the distribution with the parameters above along with the data to make sure we have a sensible fit:

As we expected, we find a pretty good agreement between the distribution and the data, whereby most of the data points are concetrated in areas of high probability density.

Fitting a Gaussian distribution

MLE for a multivariate normal distribution