Manyspikes

In this section, we will look at the fundamental rules of probability. They will allow us to work with more complex scenarios, such as expressing the probability of an event given that another event has happened.

Complement of A

In the context of an experiment with a sample space $S$ , this property states that the probability of the event $A \in S$ not occurring is equal to 1 minus the probability of the event $A$ occurring. If we define the event $\overline{A}$ as the complement of $A$ , then we can write:

\begin{equation} p(A) = 1 - p(\overline{A}) \end{equation}

with $A \cup \overline{A} = S$ .

Probability of unions of events

If $A$ and $B$ are two events, then we can write the probability of $A$ or $B$ occurring as

\begin{equation} p(A \cup B) = p(A) + p(B) - p(A \cap B). \end{equation}

This is known as the probability of the union between $A$ and $B$ . If $A$ and $B$ are disjoint, meaning that $p(A \cap B) = 0$ , then $p(A \cup B) = p(A) + p(B)$ .

Joint probability and the product rule

The probability of $A$ and $B$ jointly occurring is defined as

\begin{equation} p(A, B) = p(A|B)p(B) \end{equation}

where $p(A|B)$ is the conditional probability of event $A$ given that event $B$ occurred. This is known as the product rule. We will define the notion of conditional probability below.

Marginal probability

Given the joint probability $p(A,B)$ , the marginal probability $p(A)$ can be obtained by summing over all outcomes for B, i.e.

\begin{equation} p(A) = \sum_b p(A,B) = \sum_b p(A|B=b)p(B=b) \end{equation}

This is known as the sum rule.

If the joint probability can be expressed as the product of the marginals, i.e. $p(A,B)=p(A)p(B)$ , then $A$ and $B$ are said to be marginally independent.

Conditional probability

Conditional probabilities allow us to express the probability of an event $A$ happening given that another event $B$ has occurred. In that sense, the conditional $p(A|B)$ is the opposite of the marginal, in that the marginal disregards the particular value of $B$ whereas the conditional pins $B$ to a particular value. Rearranging (3) we get

\begin{equation} p(A|B) = \frac{p(A,B)}{p(B)} \end{equation}

Bayes' rule

Bayes' rule expresses the relationship between the conditional $p(A|B)$ and $p(B|A)$ as

\begin{equation} p(A|B) = \frac{p(B|A)p(A)}{p(B)} \end{equation}

We will later see that the probability quantities above have very specific meanings in the context of ML, and thus will have their own names.

Example

We will now work on a simple example where we will illustrate the rules above in action. Considering the roll of a fair die, let's define the following events:

$A$ : The result is an odd number
$B$ : The result is larger than 2

First, let's calculate the probability that the result is an odd number or that the result is larger than two, i.e. $p(A \cup B)$ . According to (2), we can write:

\begin{equation} p(A \cup B) = p(A) + p(B) - p(A \cap B) \end{equation}

Since the die is fair, it is equally likely to produce an odd or even number, so $p(A)=0.5$ . Regarding $B$ , there are 4 outcomes that satisfy the condition that the outcome is larger than 2, i.e. $\{3, 4, 5, 6\}$ , thus $p(B) = \frac{4}{6}$ . Finally, there are two outcomes that satisfy both $A$ and $B$ , so $p(A \cap B)=\frac{2}{6}$ . Substituting in (7) we get:

\begin{align} p(A \cup B) &= p(A) + p(B) - p(A \cap B) = 0.5 + \frac{4}{6} - \frac{2}{6} = \frac{5}{6}\\ \end{align}

Another way of arriving at the same result is to recognize that $p\left(\overline{A \cup B}\right)=\frac{1}{6}$ because there is only one outcome that does not satisfy any of $A$ or $B$ . By the complement rule, we can thus write

\begin{align} p(A \cup B) &= 1 - p(\overline{A \cup B}) = 1 - \frac{1}{6} = \frac{5}{6} \end{align}

Let's confirm that our calculations are correct with a simple simulation:

As expected, the simulation returns a value quite close to the true probability $\frac{5}{6} \approx 0.83$ .

Now let's look at an example of Bayes' rule in action. Let's say that we now know that the result was an odd number. What is then the probability that the result is larger than 2? What we are looking for here is the probability of $B$ occurring given that $A$ has occurred, which we can compute according to equation (6). We already know that $p(A)=0.5$ and $p(B)=\frac{4}{6}$ so the piece we are missing is $p(A|B)$ , which represents the probability that the result is an odd number given that it is larger than 2. We have 2 outcomes where this is true out of the 4 outcomes where $B$ is observed, so $p(A|B)=0.5$ . Substituting in (6), we get:

\begin{equation} p(B|A) = \frac{p(A|B)p(B)}{p(A)} = \frac{0.5 * \frac{4}{6}}{0.5} = \frac{4}{6} = \frac{2}{3} \end{equation}

In this example, we could just directly compute $p(B|A)$ by the same reasoning we computed $p(A|B)$ , so applying the Bayes' theorem here is arguably an overkill. However, in some situations you may be able to easily compute one of the conditionals, but not the other and that's when the Bayes' theorem can really be handy.