Welcome to the Introduction to Probability course!
Let's start with the question: why is probability important for understanding and building AI? The reason is that, ultimately, AI systems make or inform decisions. When we make a decision, we must be aware of the consequences of getting it wrong and how likely it is that we will get it wrong. The latter, which we can refer to as uncertainty, can only be examined through some form of probabilistic exercise. Thus, a solid understanding of probability is critical to successfully and safely deploy AI systems.
In this section, we will start by introducing basic terminology and definitions. We will refer to sets quite often, so let's first define what a set is and touch on basic set operations.
A set is essentially a collection of unique objects. If the set contains an object , then we say that is a member of , which we denote . If a set has no members in it, then it is called the empty set and is usually denoted by .
A set can contain subsets: smaller sets that contain one or more of the members of . For instance, the set is a subset of the set . If is a subset of , we write .
The following are the most common set operations:
Now that we have briefly introduced sets and their notation, let's dive into the definitions that will allow us to formulate the idea of probabilities.
You can think of an experiment as a process that generates some observations, usually in a non-deterministic way. For instance, a coin toss is a type of experiment which can generate two observations: heads or tails. If you toss a coin a few times in a row, most likely you won't observe the same result every time, making this process non-deterministic.
The sample space (sometimes also referred to as state space) is the set of all possible outcomes of an experiment. For our coin toss example, the sample space is heads or tails, as these are the only possible outcomes. Mathematically, we can represent the sample space for this event as a set , where we may define 0 to represent "The outcome is heads" and 1 to represent "The outcome is tails".
An event is a subset of the sample space. For instance, we can define the event as being "The outcome is heads" and the event as being "The outcome is tails". Note that both and are subsets of .
In another example, let's consider the throw of a die. In this case, the sample space is . The event "The result is even" can be defined as the set with , which is to say that is a subset of .
The probability of an event occurring is usually denoted by , with . If the event definitely occurs, then . Conversely, if the event is impossible, then .
For experiments with a finite number of results, calculating the probability of an event amounts to dividing the number of outcomes favourable to by the total number of possible outcomes:
In experiments where the number of outcomes is infinite (for instance, if we are assessing the probability of observing a particular temperature on a given day), the equation above does not apply—we will see how to deal with these cases later.
Let's now see an example where we can follow the logic above. We would like to calculate the probability of throwing one die and observing an even number smaller than 3. The number of favourable outcomes is therefore 1 (since 2 is the only even number smaller than 3) and the total number of outcomes is 6, one for each number in the die. Thus, the probability is . We can confirm this with a little simulation:
As expected, the fraction of favourable results is quite close to the true probability.
There are two main viewpoints when it comes to interpreting probabilities: The frequentist and the Bayesian. The frequentist viewpoint interprets probabilities as the frequency with each an event occurs when executing an experiment multiple times. Conversely, the Bayesian interpretation considers probabilites as a measure of uncertainty or belief that a particular event will occur, taking into account prior knowledge we may already have about the experiment.
The Bayesian viewpoint has gained significant prominence in the ML space, and there is even a subfield of ML denominated Bayesian Machine Learning. We will cover some of the most common Bayesian ML algorithms in future modules.