manyspikes

Continuous random variables

Initialising environment...

In the previous section, we looked at examples of random variables that can take discrete values. While these are useful in many cases, they are not sufficient to represent variability of quantities that are inherently continuous.

For instance, say that we would like to model the height of a particular population. Height is a continuous variable since it can take on an infinite number of values in a given range. As a consequence, we now have an infinite number of possible values for this random variable, so how do we assign a probability for each of them?

With discrete variables, every discrete value has a probability mass associated with it, which describes how likely it is to observed—this is given by the probability mass function (PMF). When working with random variables, we need to think about probability density instead: density expresses how much probability is packed into a particular range of values. Accordingly, this is expressed by a Probability Density Function (PDF).

A consequence of working with probability densities is that the probability of observing any exact value with infinite precision is zero, since the range of values in this case would also be zero. To actually derive any actual probabilities from a PDF, we need to integrate it over the range of values we are interested in. Thus, for a continuous random variable XX, the probability of observing a value in the interval (a,b)(a,b) is given by:

p(X(a,b))=abf(x)dx\begin{equation} p(X \in (a, b)) = \int_a^b f(x) \,\text{d}x \end{equation}

The Cumulative Distribution Function (CDF) is a specific case of equation (1), where a=a=-\infty and b=xb=x:

F(x)=p(Xx)=xf(y)dy.\begin{equation} F(x) = p(X \le x) = \int_{-\infty}^x f(y) \,\text{d}y . \end{equation}

Note that we replaced xx by yy simply because xx is now used to parameterise the upper integration limit. Contrary to the PDF, the CDF does return an actual probability because it integrates the PDF over the range (,x)(-\infty, x). However, the CDF is often computed by numerical approximations as the PDF can be hard to integrate analytically.

Gaussian distribution

The Gaussian distribution, also known as the normal distribution, is arguably the most pervasive probability distribution you can think of, for a few different reasons. First, it often provides good approximations of phenomena observed in natural, physical and social sciences. Second, it is relatively easy to work with (as we shall see later). Finally, even when variables are not well modelled by a Gaussian distribution (i.e. not normally distributed), it turns out that the distribution of the average value over different samples converges to a Gaussian distribution. This is known as the Central Limit Theorem.

The probability density function for the Gaussian distribution is given by

f(xμ,σ)=1σ2πe12(xμσ)2\begin{equation} f(x|\mu,\sigma) = \frac{1}{\sigma\sqrt{2\pi}} \, e^{-\frac{1}{2}\left( \frac{x-\mu}{\sigma} \right) ^2} \end{equation}

where μ\mu represents the mean of the distribution and σ\sigma represents the standard deviation. When μ=0\mu=0 and σ=1\sigma=1, the distribution is called the standard normal distribution.

Let's visualise the distribution:

Due to its shape, the Gaussian distribution is often called a bell-shaped distribution. The parameter μ\mu controls where the bell is located relative to the xx axis: in the example above, the bell is centred around x=0x=0. The parameter σ\sigma controls the width of the bell: the larger the value of σ\sigma, the wider the bell. Feel free to play with the parameters in the cell above to see how they affect the shape of the distribution.

Central limit theorem

Suppose that we take a sample of size nn from an arbitrarily distributed random variable XX and we compute its mean Xnˉ\bar{X_n} . Since XX is a random variable, Xnˉ\bar{X_n} will itself be a random variable as well. So how is Xnˉ\bar{X_n} distributed?

In its simplest form, the Central Limit Theorem states that, if nn is large enough, Xnˉ\bar{X_n} is normally distributed.

Let's check that with a quick simulation. Suppose that XX is Bernoulli distributed (see previous section) with θ=0.6\theta=0.6. First, we will generate samples of size nn from XX and compute the mean. Then, we will repeat this many times to verify that the mean values look normally distributed by plotting a histogram of the data.

If you run the cell above a few times (the data changes slightly due to randomness), you should see that the data looks approximately Gaussian, as predicted by the Central Limit Theorem.

In the example above, we looked at the mean across samples that were identically distributed (Bernoulli distributed), but under some conditions the same holds for samples from different distributions, as long as the samples are independent (we won't cover this here, but you can look for Lindeberg's and Lyapunov's conditions if you are interested in knowing more).