manyspikes

Basic definitions

Initialising environment...

Welcome to the Introduction to Probability course!

Let's start with the question: why is probability important for understanding and building AI? The reason is that, ultimately, AI systems make or inform decisions. When we make a decision, we must be aware of the consequences of getting it wrong and how likely it is that we will get it wrong. The latter, which we can refer to as uncertainty, can only be examined through some form of probabilistic exercise. Thus, a solid understanding of probability is critical to successfully and safely deploy AI systems.

In this section, we will start by introducing basic terminology and definitions. We will refer to sets quite often, so let's first define what a set is and touch on basic set operations.

A set is essentially a collection of unique objects. If the set SS contains an object xx, then we say that xx is a member of SS, which we denote xSx \in S. If a set has no members in it, then it is called the empty set and is usually denoted by \emptyset.

A set SS can contain subsets: smaller sets that contain one or more of the members of SS. For instance, the set T={1,2}T=\{1,2\} is a subset of the set S={1,2,3}S=\{1,2,3\}. If TT is a subset of SS, we write TST \in S.

The following are the most common set operations:

  • Union: Denoted by ABA \cup B, the union of AA and BB is the set of all members of A and all members of B.
  • Intersection: The intersection of AA and BB, written as ABA \cap B is the set of all objects that are simultaneously members of AA and members of BB.
  • Complement: The complement of AA with respect to a given referece set SS is the set of all objects that are members of SS that are not members of the set AA. It is denoted as A\overline{A}.

Now that we have briefly introduced sets and their notation, let's dive into the definitions that will allow us to formulate the idea of probabilities.

Experiment

You can think of an experiment as a process that generates some observations, usually in a non-deterministic way. For instance, a coin toss is a type of experiment which can generate two observations: heads or tails. If you toss a coin a few times in a row, most likely you won't observe the same result every time, making this process non-deterministic.

Sample space

The sample space (sometimes also referred to as state space) is the set of all possible outcomes of an experiment. For our coin toss example, the sample space is heads or tails, as these are the only possible outcomes. Mathematically, we can represent the sample space for this event as a set S={0,1}S=\{0, 1\}, where we may define 0 to represent "The outcome is heads" and 1 to represent "The outcome is tails".

Event

An event is a subset of the sample space. For instance, we can define the event HH as being "The outcome is heads" and the event TT as being "The outcome is tails". Note that both HH and TT are subsets of SS.

In another example, let's consider the throw of a die. In this case, the sample space is S={1,2,3,4,5,6}S=\{1, 2, 3, 4, 5, 6\}. The event "The result is even" can be defined as the set E={2,4,6}E=\{2, 4, 6\} with ESE \subset S, which is to say that EE is a subset of SS.

Probability

The probability of an event AA occurring is usually denoted by p(A)p(A), with 0p(A)10 \le p(A) \le 1. If the event definitely occurs, then p(A)=1p(A)=1. Conversely, if the event is impossible, then p(A)=0p(A)=0.

Calculating probabilities

For experiments with a finite number of results, calculating the probability of an event AA amounts to dividing the number of outcomes favourable to AA by the total number of possible outcomes:

p(A)=Number of outcomes favourable to STotal number of outcomesp(A) = \frac{\textrm{Number of outcomes favourable to S}}{\textrm{Total number of outcomes}}

In experiments where the number of outcomes is infinite (for instance, if we are assessing the probability of observing a particular temperature on a given day), the equation above does not apply—we will see how to deal with these cases later.

Let's now see an example where we can follow the logic above. We would like to calculate the probability of throwing one die and observing an even number smaller than 3. The number of favourable outcomes is therefore 1 (since 2 is the only even number smaller than 3) and the total number of outcomes is 6, one for each number in the die. Thus, the probability is 16\frac{1}{6}. We can confirm this with a little simulation:

As expected, the fraction of favourable results is quite close to the true probability.

Interpreting probabilities

There are two main viewpoints when it comes to interpreting probabilities: The frequentist and the Bayesian. The frequentist viewpoint interprets probabilities as the frequency with each an event occurs when executing an experiment multiple times. Conversely, the Bayesian interpretation considers probabilites as a measure of uncertainty or belief that a particular event will occur, taking into account prior knowledge we may already have about the experiment.

The Bayesian viewpoint has gained significant prominence in the ML space, and there is even a subfield of ML denominated Bayesian Machine Learning. We will cover some of the most common Bayesian ML algorithms in future modules.