manyspikes

Fitting a Gaussian distribution

Initialising environment...

Over the last few sections, we have looked at how the parameters of probability distributions affect their overall properties. In this section, we are going to introduce a generic approach to estimate the parameters for a given distribution.

The basic idea is as follows. Suppose we have some data XX that you want to model using a specific probabily distribution (e.g. multivariate normal) with parameters θ\mathbf{\theta}. Our goal is to choose the parameters so that the data is a likely as possible under the resulting distribution. This is called Maximum Likelihood Estimation (MLE).

Let's walk through a simple example. Imagine that your dataset is a collection of indoor temperature measurements (Celsius) in a supermarket. Assuming that a normal distribution is an appropriate choice, let's look at how suitable different parameters are. Let's start with μ=0\mu=0 and σ=0\sigma=0: how likely is it that a normal distribution with μ=0\mu=0 and σ=1\sigma=1 can generate the temperature data we observed? This is obviously not very likely, since such a normal distribution has most of its probability density between -1 and 1, which would be a very low value for an indoor temperature. Now consider another normal distribution with μ=18\mu=18 and σ=1\sigma=1: this distribution looks a lot more likely, since most of its probability density would be in the interval [17,19][17, 19], which is a much more likely value for indoor temperatures. The latter choice of parameters makes the data much more likely.

What we want to know is: what are the parameters that maximise the likelhood of the data? There are different ways to estimate parameters using MLE: in some cases, estimating parameters via MLE is straightforward; in other cases, it may require iterative optimisation methods such as gradient descent. In any case, the departure point for Maximum Likelihood Estimation can be written as:

θ^MLE=argmaxθP(Xθ)\begin{equation} \hat{\boldsymbol\theta}_{\text{MLE}} = \underset{\boldsymbol\theta}{\arg\max} \, P(X \mid \boldsymbol\theta) \end{equation}

MLE for a multivariate normal distribution

Now let's look at simple example of MLE. We won't cover the proof here, but it turns out that the MLE parameters for a multivariate normal distribution can be computed in closed form given the data: the empirical mean and the empirical covariance matrix are the maximum likelihood estimates. Let's see this in action using the Iris dataset once again:

Now let's plot the distribution with the parameters above along with the data to make sure we have a sensible fit:

As we expected, we find a pretty good agreement between the distribution and the data, whereby most of the data points are concetrated in areas of high probability density.