In this section, we will look at the fundamental rules of probability. They will allow us to work with more complex scenarios, such as expressing the probability of an event given that another event has happened.
In the context of an experiment with a sample space , this property states that the probability of the event not occurring is equal to 1 minus the probability of the event occurring. If we define the event as the complement of , then we can write:
with .
If and are two events, then we can write the probability of or occurring as
This is known as the probability of the union between and . If and are disjoint, meaning that , then .
The probability of and jointly occurring is defined as
where is the conditional probability of event given that event occurred. This is known as the product rule. We will define the notion of conditional probability below.
Given the joint probability , the marginal probability can be obtained by summing over all outcomes for B, i.e.
This is known as the sum rule.
If the joint probability can be expressed as the product of the marginals, i.e. , then and are said to be marginally independent.
Conditional probabilities allow us to express the probability of an event happening given that another event has occurred. In that sense, the conditional is the opposite of the marginal, in that the marginal disregards the particular value of whereas the conditional pins to a particular value. Rearranging (3) we get
Bayes' rule expresses the relationship between the conditional and as
We will later see that the probability quantities above have very specific meanings in the context of ML, and thus will have their own names.
We will now work on a simple example where we will illustrate the rules above in action. Considering the roll of a fair die, let's define the following events:
First, let's calculate the probability that the result is an odd number or that the result is larger than two, i.e. . According to (2), we can write:
Since the die is fair, it is equally likely to produce an odd or even number, so . Regarding , there are 4 outcomes that satisfy the condition that the outcome is larger than 2, i.e. , thus . Finally, there are two outcomes that satisfy both and , so . Substituting in (7) we get:
Another way of arriving at the same result is to recognize that because there is only one outcome that does not satisfy any of or . By the complement rule, we can thus write
Let's confirm that our calculations are correct with a simple simulation:
As expected, the simulation returns a value quite close to the true probability .
Now let's look at an example of Bayes' rule in action. Let's say that we now know that the result was an odd number. What is then the probability that the result is larger than 2? What we are looking for here is the probability of occurring given that has occurred, which we can compute according to equation (6). We already know that and so the piece we are missing is , which represents the probability that the result is an odd number given that it is larger than 2. We have 2 outcomes where this is true out of the 4 outcomes where is observed, so . Substituting in (6), we get:
In this example, we could just directly compute by the same reasoning we computed , so applying the Bayes' theorem here is arguably an overkill. However, in some situations you may be able to easily compute one of the conditionals, but not the other and that's when the Bayes' theorem can really be handy.