In this section, we will start looking at how to model discrete data using probability theory. The first thing we need to do is to introduce the concept of a random variable.
You can think of a random variable (or stochastic variable) as a function that, given a set of possible outcomes, produces a value which is a result of a random phenomenon (the formal definition is a bit more involved, so we will stick to this informal description). For instance, a temperature measurement is a random variable where all possible outcomes is the set of real numbers. The term random means that the values of this variable have some randomness to it, for instance caused by measurement noise.
We will first look at discrete random variables, which are random variables with a space of possible outcomes which is discrete (as opposed to continuous). For instance, the number of parcels delivered to your house per day might be thought of as a discrete random variable since the set of possible outcomes is the set of non-negative integers.
Let's then define the random variable as the number of parcels delivered to your home per day. The probability that you receive 2 parcels per day is then denoted by . In the discrete case, is called the probability mass function and it returns probability values for all valid values of X. The probability mass function must obey the rule of total probability, which means that .
There are different families of probability mass functions and, as we will see later, they are suitable to model different types of data. For instance, the probability mass function used to model the number of parcels delivered per day would not be suitable to model the absence/presence of rain per day. Below we will look at some examples of commonly used probability mass functions.
Note: We will often use the term probability distribution. In the discrete case, this term is usually used interchangeably with probability mass function. In the continuous case, this term is often used to mean probability density function, a concept we will introduce later in this module.
The Bernoulli distribution is a discrete distribution used to model random variables with binary outcomes, such as the coin toss examples we looked at in previous sections. The probability mass function is given by:
where is often referred to as the probability of success, where success can be arbitrarily defined. For instance, we can define success as a dry day on a particular city. Based on historical data, we may know that the probability of a dry day on that city is 0.42, in which case the probability mass function would be:
We can visualise the probability mass function with a simple bar graph:
Now let's consider the scenario where we repeat the Bernoulli experiment a times, and we then ask what is the probability of not observing success at all? And what is the probability of observing success in all trials?
The Binomial distribution models the number of successes in a set of Bernoulli trials and has the form:
Using the binomial distribution, let's extend the example above to calculate the probability mass function for the number of dry days for a set of 30 randomly chosen days. In this case, and we know from the example above that . Thus, the probability mass function becomes:
Let's plot the probability mass function:
Try varying the parameter to any value between 0 and 1 and see what happens to the shape of the distribution. Likewise, do the same for the number of trials . In particular, note that the Bernoulli distribution above is a special case of the binomial distribution for .
The mean (or expected value) and variance for the Binomial distribution are given by and , respectively.
The multinomial distribution generalizes the binomial distribution to model experiments where trial outcomes are not binary, such as throwing a die. The probability mass function for the multinomial distribution is given by
where is the number of outcomes associated with a single trial. Note that is now a vector, where the element represents the number of times the outcome is observed.
To make this more concrete, let's look at the case of a fair 6-sided die rolled 12 times. In this case, , and , so equation (5) becomes
Let's say we would like to compute the probability that each side was observed exactly 2 times, meaning that . According to (6), we would get:
Again, we can simulate this to verify that we are correct.
The cell above does not always produce the same output due to randomness, but the result should be quite close to the one we obtained in equation (7).
The last discrete distribution we will cover is the Poisson distribution, which is typically used to model the number of events occurred during a given time interval. For instance, the number of parcels delivered to your home per day may be well modelled by a Poisson distribution.
The probability mass function for the Poisson is given by
where is the rate at which events are expected to occur in a given time frame. The Poisson distribution has the interesting property that its mean (which is defined by the parameter ) is equal to its variance:
Let's say that, on average, you receive 3.5 parcels per week, which corresponds to 0.5 parcels per day. The Poisson distribution that describes this process would then look like:
Let's visualise the distribution:
Again, feel free to play around with the parameter to see how the distribution changes. To calculate the probability of receiving a given number of parcels on a single day, we can take equation (10) and substitute by the desired number of parcels. For instance, the probability of receiving 2 parcels on a given day is given by: