manyspikes

Fundamental rules of probability

Initialising environment...

In this section, we will look at the fundamental rules of probability. They will allow us to work with more complex scenarios, such as expressing the probability of an event given that another event has happened.

Complement of A

In the context of an experiment with a sample space SS, this property states that the probability of the event ASA \in S not occurring is equal to 1 minus the probability of the event AA occurring. If we define the event A\overline{A} as the complement of AA, then we can write:

p(A)=1p(A)\begin{equation} p(A) = 1 - p(\overline{A}) \end{equation}

with AA=SA \cup \overline{A} = S.

Probability of unions of events

If AA and BB are two events, then we can write the probability of AA or BB occurring as

p(AB)=p(A)+p(B)p(AB).\begin{equation} p(A \cup B) = p(A) + p(B) - p(A \cap B). \end{equation}

This is known as the probability of the union between AA and BB. If AA and BB are disjoint, meaning that p(AB)=0p(A \cap B) = 0, then p(AB)=p(A)+p(B)p(A \cup B) = p(A) + p(B).

Joint probability and the product rule

The probability of AA and BB jointly occurring is defined as

p(A,B)=p(AB)p(B)\begin{equation} p(A, B) = p(A|B)p(B) \end{equation}

where p(AB)p(A|B) is the conditional probability of event AA given that event BB occurred. This is known as the product rule. We will define the notion of conditional probability below.

Marginal probability

Given the joint probability p(A,B)p(A,B), the marginal probability p(A)p(A) can be obtained by summing over all outcomes for B, i.e.

p(A)=bp(A,B)=bp(AB=b)p(B=b)\begin{equation} p(A) = \sum_b p(A,B) = \sum_b p(A|B=b)p(B=b) \end{equation}

This is known as the sum rule.

If the joint probability can be expressed as the product of the marginals, i.e. p(A,B)=p(A)p(B)p(A,B)=p(A)p(B), then AA and BB are said to be marginally independent.

Conditional probability

Conditional probabilities allow us to express the probability of an event AA happening given that another event BB has occurred. In that sense, the conditional p(AB)p(A|B) is the opposite of the marginal, in that the marginal disregards the particular value of BB whereas the conditional pins BB to a particular value. Rearranging (3) we get

p(AB)=p(A,B)p(B)\begin{equation} p(A|B) = \frac{p(A,B)}{p(B)} \end{equation}

Bayes' rule

Bayes' rule expresses the relationship between the conditional p(AB)p(A|B) and p(BA)p(B|A) as

p(AB)=p(BA)p(A)p(B)\begin{equation} p(A|B) = \frac{p(B|A)p(A)}{p(B)} \end{equation}

We will later see that the probability quantities above have very specific meanings in the context of ML, and thus will have their own names.

Example

We will now work on a simple example where we will illustrate the rules above in action. Considering the roll of a fair die, let's define the following events:

  • AA: The result is an odd number
  • BB: The result is larger than 2

First, let's calculate the probability that the result is an odd number or that the result is larger than two, i.e. p(AB)p(A \cup B). According to (2), we can write:

p(AB)=p(A)+p(B)p(AB)\begin{equation} p(A \cup B) = p(A) + p(B) - p(A \cap B) \end{equation}

Since the die is fair, it is equally likely to produce an odd or even number, so p(A)=0.5p(A)=0.5. Regarding BB, there are 4 outcomes that satisfy the condition that the outcome is larger than 2, i.e. {3,4,5,6}\{3, 4, 5, 6\}, thus p(B)=46p(B) = \frac{4}{6}. Finally, there are two outcomes that satisfy both AA and BB, so p(AB)=26p(A \cap B)=\frac{2}{6}. Substituting in (7) we get:

p(AB)=p(A)+p(B)p(AB)=0.5+4626=56\begin{align} p(A \cup B) &= p(A) + p(B) - p(A \cap B) = 0.5 + \frac{4}{6} - \frac{2}{6} = \frac{5}{6}\\ \end{align}

Another way of arriving at the same result is to recognize that p(AB)=16p\left(\overline{A \cup B}\right)=\frac{1}{6} because there is only one outcome that does not satisfy any of AA or BB. By the complement rule, we can thus write

p(AB)=1p(AB)=116=56\begin{align} p(A \cup B) &= 1 - p(\overline{A \cup B}) = 1 - \frac{1}{6} = \frac{5}{6} \end{align}

Let's confirm that our calculations are correct with a simple simulation:

As expected, the simulation returns a value quite close to the true probability 560.83\frac{5}{6} \approx 0.83.

Now let's look at an example of Bayes' rule in action. Let's say that we now know that the result was an odd number. What is then the probability that the result is larger than 2? What we are looking for here is the probability of BB occurring given that AA has occurred, which we can compute according to equation (6). We already know that p(A)=0.5p(A)=0.5 and p(B)=46p(B)=\frac{4}{6} so the piece we are missing is p(AB)p(A|B), which represents the probability that the result is an odd number given that it is larger than 2. We have 2 outcomes where this is true out of the 4 outcomes where BB is observed, so p(AB)=0.5p(A|B)=0.5. Substituting in (6), we get:

p(BA)=p(AB)p(B)p(A)=0.5460.5=46=23\begin{equation} p(B|A) = \frac{p(A|B)p(B)}{p(A)} = \frac{0.5 * \frac{4}{6}}{0.5} = \frac{4}{6} = \frac{2}{3} \end{equation}

In this example, we could just directly compute p(BA)p(B|A) by the same reasoning we computed p(AB)p(A|B), so applying the Bayes' theorem here is arguably an overkill. However, in some situations you may be able to easily compute one of the conditionals, but not the other and that's when the Bayes' theorem can really be handy.