In this section we will focus on the multivariate normal (or multivariate Gaussian) distribution. This distribution is used in a variety of ML techniques, such as Linear Discriminant Analysis, Mixture Models and Gaussian Processes.
The multivariate normal distribution has the following PDF:
where is a -dimensional vector representing the random variables, is a -dimensinal mean vector (containing the mean values for each random variable) and is the covariance matrix for the variables at hand.
This might look a bit complicated, so let's look at equation (1) and assume . If you recall that the diagonal of the covariance matrix is composed of the variances of each variable, then you will conclude that when the covariance matrix becomes the variance of the only variable at hand. Equally, the mean vector also becomes a scalar with the value equal to the mean of the variable. Thus, the equation above becomes the PDF for the univariate normal distribution.
Let's visualise the distribution for :
In the plot above, the and axes represent different values of the variables and , while the axis represents the corresponding probability density. It is also common to visualise this type of data using contour plots, in which the probability density is encoded by a color map. Here is the same distribution visualised using a contour plot.
You can play with the values for the mean and for the covariance matrix (remember to keep the matrix symmetric) and run the two previous cells to check what happens to the shape of the distribution. In particular, you should play with the values of the off-diagonal elements of the covariance matrix. For instance, compare the shape of the distribution if you set the off-diagonal elements to 0.5 ot -0.5. You should see that the shape of the distribution changes from indicating a positive to a negative relationship between and .
Another thing we would like to note is that if you take a slice along any vertical or horizontal line in the plot above, you get a univariate normal distribution. Let's verify that by plotting the distribution above slices at a few different locations:
The process of looking at how other variables vary while keeping one variable constant is called conditioning. It is something we will come back to in future modules: for now, keep in mind that if you condition a mutlivariate normal distribution, the resulting distribution is also a normal distribution.
Now let's assume that we would like to know how one variable is distributed regardless of the values taken by other variables. This is called marginalization and involves integrating out the other variables. For instance, in the 2-dimensional case above, the marginal density can be defined as
We can get a sense of what the marginal looks like by discretising the integral (i.e. replacing it by a sum over one of the variables).
As you are probably guessing, it turns out that the marginals of a multivariate normal are themselves normally distributed.
Even though we introduced the concept of conditioning and marginalisation using the multivariate normal, keep in mind that they apply to multivariate probability distributions in general. Note that often it isn't be possible to compute these analytically, so we have to resort to numerical approaches instead.