manyspikes

Discrete random variables

Initialising environment...

In this section, we will start looking at how to model discrete data using probability theory. The first thing we need to do is to introduce the concept of a random variable.

You can think of a random variable (or stochastic variable) as a function that, given a set of possible outcomes, produces a value which is a result of a random phenomenon (the formal definition is a bit more involved, so we will stick to this informal description). For instance, a temperature measurement is a random variable where all possible outcomes is the set of real numbers. The term random means that the values of this variable have some randomness to it, for instance caused by measurement noise.

We will first look at discrete random variables, which are random variables with a space of possible outcomes which is discrete (as opposed to continuous). For instance, the number of parcels delivered to your house per day might be thought of as a discrete random variable since the set of possible outcomes is the set of non-negative integers.

Let's then define the random variable XX as the number of parcels delivered to your home per day. The probability that you receive 2 parcels per day is then denoted by p(X=2)p(X=2). In the discrete case, pp is called the probability mass function and it returns probability values for all valid values of X. The probability mass function must obey the rule of total probability, which means that xp(X=x)=1\sum_x p(X=x) = 1.

There are different families of probability mass functions and, as we will see later, they are suitable to model different types of data. For instance, the probability mass function used to model the number of parcels delivered per day would not be suitable to model the absence/presence of rain per day. Below we will look at some examples of commonly used probability mass functions.

Note: We will often use the term probability distribution. In the discrete case, this term is usually used interchangeably with probability mass function. In the continuous case, this term is often used to mean probability density function, a concept we will introduce later in this module.

Bernoulli distribution

The Bernoulli distribution is a discrete distribution used to model random variables with binary outcomes, such as the coin toss examples we looked at in previous sections. The probability mass function is given by:

p(xθ)={θif x=11θif x=0\begin{equation} p(x|\theta) = \left\{ \begin{array}{ll} \theta & \textrm{if }\,x=1 \\ 1 - \theta & \textrm{if }\, x=0\\ \end{array} \right. \end{equation}

where θ\theta is often referred to as the probability of success, where success can be arbitrarily defined. For instance, we can define success as a dry day on a particular city. Based on historical data, we may know that the probability of a dry day on that city is 0.42, in which case the probability mass function would be:

p(x)={0.42if x=10.58if x=0\begin{equation} p(x) = \left\{ \begin{array}{ll} 0.42 & \textrm{if }\,x=1 \\ 0.58 & \textrm{if }\, x=0\\ \end{array} \right. \end{equation}

We can visualise the probability mass function with a simple bar graph:

Binomial distribution

Now let's consider the scenario where we repeat the Bernoulli experiment a nn times, and we then ask what is the probability of not observing success at all? And what is the probability of observing success in all nn trials?

The Binomial distribution models the number of successes in a set of Bernoulli trials and has the form:

p(xn,θ)=n!x!(nx)!θx(1θ)nx\begin{equation} p(x|n, \theta) = \frac{n!}{x!(n-x)!}\theta^x(1-\theta)^{n-x} \end{equation}

Using the binomial distribution, let's extend the example above to calculate the probability mass function for the number of dry days for a set of 30 randomly chosen days. In this case, n=30n=30 and we know from the example above that θ=0.42\theta=0.42. Thus, the probability mass function becomes:

p(x)=30!x!(30x)!0.42x0.58nx\begin{equation} p(x) = \frac{30!}{x!(30-x)!}0.42^x0.58^{n-x} \end{equation}

Let's plot the probability mass function:

Try varying the parameter θ\theta to any value between 0 and 1 and see what happens to the shape of the distribution. Likewise, do the same for the number of trials nn. In particular, note that the Bernoulli distribution above is a special case of the binomial distribution for n=1n=1.

The mean (or expected value) and variance for the Binomial distribution are given by nθn\theta and nθ(1θ)n\theta(1-\theta), respectively.

Multinomial distribution

The multinomial distribution generalizes the binomial distribution to model experiments where trial outcomes are not binary, such as throwing a die. The probability mass function for the multinomial distribution is given by

p(xn,θ)=n!j=1Kxj!j=1Kθjxj\begin{equation} p(\mathbf{x}|n, \mathbf{\theta}) = \frac{n!}{\prod_{j=1}^{K}x_j!}\prod_{j=1}^{K} \theta_j^{x_j} \end{equation}

where KK is the number of outcomes associated with a single trial. Note that x\mathbf{x} is now a vector, where the jthj^{th} element represents the number of times the outcome jj is observed.

To make this more concrete, let's look at the case of a fair 6-sided die rolled 12 times. In this case, n=12n=12, K=6K=6 and θ1=θ2=...=θ6=16\theta_1 = \theta_2 = ... = \theta_6 = \frac{1}{6}, so equation (5) becomes

p(x)=12!j=16xj!j=1616xj\begin{equation} p(\mathbf{x}) = \frac{12!}{\prod_{j=1}^{6}x_j!}\prod_{j=1}^{6} \frac{1}{6}^{x_j} \end{equation}

Let's say we would like to compute the probability that each side was observed exactly 2 times, meaning that x=[2,2,2,2,2,2]\mathbf{x}=[2,2,2,2,2,2]. According to (6), we would get:

p(x=[2,2,2,2,2,2])=12!j=162!j=16162=12!(2!)6(162)60.0034\begin{equation} p(\mathbf{x}=[2,2,2,2,2,2]) = \frac{12!}{\prod_{j=1}^{6} 2!}\prod_{j=1}^{6} \frac{1}{6}^{2} = \frac{12!}{\left(2!\right)^6}\left(\frac{1}{6}^{2}\right)^6 \approx 0.0034 \end{equation}

Again, we can simulate this to verify that we are correct.

The cell above does not always produce the same output due to randomness, but the result should be quite close to the one we obtained in equation (7).

Poisson distribution

The last discrete distribution we will cover is the Poisson distribution, which is typically used to model the number of events occurred during a given time interval. For instance, the number of parcels delivered to your home per day may be well modelled by a Poisson distribution.

The probability mass function for the Poisson is given by

p(xλ)=eλλxx!\begin{equation} p(x|\lambda) = e^{-\lambda}\frac{\lambda^x}{x!} \end{equation}

where λ>0\lambda>0 is the rate at which events are expected to occur in a given time frame. The Poisson distribution has the interesting property that its mean (which is defined by the parameter λ\lambda) is equal to its variance:

λ=E(X)=Var(X)\begin{equation} \lambda = \mathbb{E}(X) = \textrm{Var}(X) \end{equation}

Let's say that, on average, you receive 3.5 parcels per week, which corresponds to 0.5 parcels per day. The Poisson distribution that describes this process would then look like:

p(x)=e0.50.5xx!\begin{equation} p(x) = e^{-0.5}\frac{0.5^x}{x!} \end{equation}

Let's visualise the distribution:

Again, feel free to play around with the parameter λ\lambda to see how the distribution changes. To calculate the probability of receiving a given number of parcels on a single day, we can take equation (10) and substitute xx by the desired number of parcels. For instance, the probability of receiving 2 parcels on a given day is given by:

p(x=2)=e0.50.522!0.076\begin{equation} p(x=2) = e^{-0.5}\frac{0.5^2}{2!} \approx 0.076 \end{equation}