Manyspikes

Welcome to the Introduction to Probability course!

Let's start with the question: why is probability important for understanding and building AI? The reason is that, ultimately, AI systems make or inform decisions. When we make a decision, we must be aware of the consequences of getting it wrong and how likely it is that we will get it wrong. The latter, which we can refer to as uncertainty, can only be examined through some form of probabilistic exercise. Thus, a solid understanding of probability is critical to successfully and safely deploy AI systems.

In this section, we will start by introducing basic terminology and definitions. We will refer to sets quite often, so let's first define what a set is and touch on basic set operations.

A set is essentially a collection of unique objects. If the set $S$ contains an object $x$ , then we say that $x$ is a member of $S$ , which we denote $x \in S$ . If a set has no members in it, then it is called the empty set and is usually denoted by $\emptyset$ .

A set $S$ can contain subsets: smaller sets that contain one or more of the members of $S$ . For instance, the set $T=\{1,2\}$ is a subset of the set $S=\{1,2,3\}$ . If $T$ is a subset of $S$ , we write $T \in S$ .

The following are the most common set operations:

Union: Denoted by $A \cup B$ , the union of $A$ and $B$ is the set of all members of A and all members of B.
Intersection: The intersection of $A$ and $B$ , written as $A \cap B$ is the set of all objects that are simultaneously members of $A$ and members of $B$ .
Complement: The complement of $A$ with respect to a given referece set $S$ is the set of all objects that are members of $S$ that are not members of the set $A$ . It is denoted as $\overline{A}$ .

Now that we have briefly introduced sets and their notation, let's dive into the definitions that will allow us to formulate the idea of probabilities.

Experiment

You can think of an experiment as a process that generates some observations, usually in a non-deterministic way. For instance, a coin toss is a type of experiment which can generate two observations: heads or tails. If you toss a coin a few times in a row, most likely you won't observe the same result every time, making this process non-deterministic.

Sample space

The sample space (sometimes also referred to as state space) is the set of all possible outcomes of an experiment. For our coin toss example, the sample space is heads or tails, as these are the only possible outcomes. Mathematically, we can represent the sample space for this event as a set $S=\{0, 1\}$ , where we may define 0 to represent "The outcome is heads" and 1 to represent "The outcome is tails".

Event

An event is a subset of the sample space. For instance, we can define the event $H$ as being "The outcome is heads" and the event $T$ as being "The outcome is tails". Note that both $H$ and $T$ are subsets of $S$ .

In another example, let's consider the throw of a die. In this case, the sample space is $S=\{1, 2, 3, 4, 5, 6\}$ . The event "The result is even" can be defined as the set $E=\{2, 4, 6\}$ with $E \subset S$ , which is to say that $E$ is a subset of $S$ .

Probability

The probability of an event $A$ occurring is usually denoted by $p(A)$ , with $0 \le p(A) \le 1$ . If the event definitely occurs, then $p(A)=1$ . Conversely, if the event is impossible, then $p(A)=0$ .

Calculating probabilities

For experiments with a finite number of results, calculating the probability of an event $A$ amounts to dividing the number of outcomes favourable to $A$ by the total number of possible outcomes:

p(A) = \frac{\textrm{Number of outcomes favourable to S}}{\textrm{Total number of outcomes}}

In experiments where the number of outcomes is infinite (for instance, if we are assessing the probability of observing a particular temperature on a given day), the equation above does not apply—we will see how to deal with these cases later.

Let's now see an example where we can follow the logic above. We would like to calculate the probability of throwing one die and observing an even number smaller than 3. The number of favourable outcomes is therefore 1 (since 2 is the only even number smaller than 3) and the total number of outcomes is 6, one for each number in the die. Thus, the probability is $\frac{1}{6}$ . We can confirm this with a little simulation:

As expected, the fraction of favourable results is quite close to the true probability.

Interpreting probabilities

There are two main viewpoints when it comes to interpreting probabilities: The frequentist and the Bayesian. The frequentist viewpoint interprets probabilities as the frequency with each an event occurs when executing an experiment multiple times. Conversely, the Bayesian interpretation considers probabilites as a measure of uncertainty or belief that a particular event will occur, taking into account prior knowledge we may already have about the experiment.