Manyspikes

Welcome to the Introduction to Linear Algebra module. In this module, we will cover the basic linear algebra concepts that underpin even the most advanced machine learning models developed to date, such as deep neural networks. You will see how vectors are a useful way of representing data, and how matrices can be used to transform those vectors in a way that is useful for distinguishing between different patterns in data.

What is a vector?

Let's start with this question: what is a vector?

At school we are typically tought the geometric view of a vector. We learn to think of vector as 2D or 3D arrows that can be drawn on a coordinate system according to its coordinates. For instance, the vector $\mathbf{v} = [178, 72]$ would be represented by an arrow that starts at the origin of the coordinate system, where $x = y = 0$ , and ends at the location where $x=178$ and $y=72$ .

This view is quite convenient for visualization purposes. Let's plot the vector $\mathbf{v}$ :

However, for our purposes a vector is a mechanism to hold data. In the example above, it turns out that values we used for the $x$ and $y$ coordinates actually represent the weight and height of a person. What we've done was using the vector $\mathbf{v}$ to store two attributes of a person. In essence, we used the vector to store data, whereby the first element of the vector was used to store the height and the second used to store the weight. Now given another person, we could create a new vector $\mathbf{w}$ to store their height and weight as well. Again the vector would have two elements, the first being the height and the second the weight.

Each vector has two elements, so we say that these vectors are 2-dimensional. If we chose to collect data about a person's age, we would have to add one element to each vector (to store the age of each person), so the vectors would then be three-dimensional. And if we chose to collect data about any other number of attributes, say $N$ , then the vectors would be $N$ -dimensional. In a nutshell, a vector's dimensionality is the number of elements it contains.

Now let's take this idea a little bit further. Say you have a black-and-white picture which has a resolution of 28 by 28 pixels. That image is effectively a sequence of cells (i.e. the pixels) arranged in a grid with 28 rows and 28 columns. The brightness of each pixel is given by a number between 0 and 1, with 0 being black, white being 1, and any other number in between being a shade of gray.

Let's plot the image below.

The image above is from a dataset called MNIST, which contains images of single digits and is a commonly used dataset in machine learning research. In this case, this is an image of a somewhat wonky number 2.

This image is just data (a bunch of numbers between 0 and 1), so can we store it in a vector? Indeed, we can think of that image as a vector with $28 * 28 = 784$ pixels, which means that the dimensionality of the vector is $784$ . Each element in the vector would represent the brightness of the image at a particular position (you can hover over the image above to see the values at different positions).

Formal definitions

To arrive at the formal definition of a vector, we will first need to define fields and vector spaces.

Field

A field is a set $\mathcal{F}$ and two binary operations on \mathcal{F}: addition and multiplication. A binary operation on $\mathcal{F}$ is a mapping $\mathcal{F} × \mathcal{F} → \mathcal{F}$ , i.e. the operation represents an association between ordered paris of elements in $\mathcal{F}$ and a unique element of $\mathcal{F}$ .

An example of a field is the field of real numbers together with the addition and multiplication operations, which can be represented as $\mathbb{R}(+, \cdot)$ . Addition and multiplication map any two elements $a,b$ in $\mathcal{F}$ to a uniquely defined element in $\mathcal{F}$ . Addition is denoted by $a + b$ while multiplication is denoted by $a \cdot b$ or simply $ab$ .

For any $\alpha,\beta,\gamma$ in $\mathcal{F}$ , the following axioms must be satisfied:

Addition and multiplication are associative: $\alpha + (\beta + \gamma) = (\alpha + \beta) + \gamma$ , and $\alpha \cdot (\beta \cdot \gamma) = (\alpha \cdot \beta) \cdot \gamma$
Addition and multiplication are commutative: $\alpha + \beta$ = $\beta + \alpha$ , and $\alpha \cdot \beta = \beta \cdot \alpha$ .
Additive and multiplicative identity: there exist two distinct elements 0 and 1 in $\mathcal{F}$ such that $\alpha + 0 = \alpha$ and $\alpha \cdot 1 = \alpha$ .
Additive inverses: for every $\alpha$ in $\mathcal{F}$ , there exists an element in $\mathcal{F}$ , denoted $−\alpha$ , called the additive inverse of $\alpha$ , such that $\alpha + (−\alpha) = 0$ .
Multiplicative inverses: for every $\alpha \neq 0$ in $\mathcal{F}$ , there exists an element in $\mathcal{F}$ , denoted by $\frac{1}{\alpha}$ , called the multiplicative inverse of $\alpha$ , such that $\alpha \cdot \frac{1}{\alpha} = 1$ .
Distributivity of multiplication over addition: $\alpha \cdot (\beta + \gamma) = (\alpha \cdot \beta) + (\alpha \cdot \gamma)$ .

Vector spaces

A vector space over a field $\mathcal{F}$ is a non-empty set $\mathcal{V}$ together with the operations of vector addition and scalar multiplication, such that the following axioms are satisfied. For every $\mathbf{u}$ , $\mathbf{v}$ and $\mathbf{w}$ in $\mathcal{V}$ and every $\alpha$ and $\beta$ in $\mathcal{F}$ :

Vector addition is associative: $\mathbf{u} + (\mathbf{v} + \mathbf{w}) = (\mathbf{u} + \mathbf{v}) + \mathbf{w}$
Vector addition is commutative: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$
Additive identity: there exists an element $\mathbf{0}$ in $\mathcal{V}$ such that $\mathbf{v} + \mathbf{0} = \mathbf{v}$
Additive inverse: there exists an element $-\mathbf{v}$ in $\mathcal{V}$ such that $\mathbf{v} + (\mathbf{-v}) = \mathbf{0}$
Scalar multiplicative identity: there exists an element $1$ in $\mathcal{F}$ such that $1\mathbf{v} = \mathbf{v}$
Scalar multiplication and field multiplication are compatible: $\alpha(\beta\mathbf{v}) = (\alpha\beta)\mathbf{v}$
Scalar multiplication is distributive over vector addition: $\alpha(\mathbf{u} + \mathbf{v}) = \alpha\mathbf{u} + \alpha\mathbf{v}$
Scalar multiplication is distributive over field addition: $(\alpha + \beta)\mathbf{v} = \alpha\mathbf{v} + \beta\mathbf{v}$

The elements of a vector space are called vectors. Vectors can also be subject to other operations, such as scalar and vector products, which we will cover at a later point.

To summarise, these are the key takeaways from this section:

Vectors are objects that can be added together or scaled by a scalar to produce another vector
Vector addition maps any two vectors $\mathbf{v}$ and $\mathbf{w}$ in $\mathcal{V}$ to a third vector in $\mathcal{V}$ , commonly written as $\mathbf{v} + \mathbf{w}$
Scalar multiplication maps any scalar $\alpha$ in $\mathcal{F}$ and vector $\mathbf{v}$ in $\mathcal{V}$ to another vector in $\mathcal{V}$ , which is denoted $\alpha\mathbf{v}$
In practice, for our purposes we represent vectors as sequences of numeric values.

In the next section, we will look at adding and scaling mean in practice, using an image as our example vector.