Welcome to the Introduction to Linear Algebra module. In this module, we will cover the basic linear algebra concepts that underpin even the most advanced machine learning models developed to date, such as deep neural networks. You will see how vectors are a useful way of representing data, and how matrices can be used to transform those vectors in a way that is useful for distinguishing between different patterns in data.
Let's start with this question: what is a vector?
At school we are typically tought the geometric view of a vector. We learn to think of vector as 2D or 3D arrows that can be drawn on a coordinate system according to its coordinates. For instance, the vector would be represented by an arrow that starts at the origin of the coordinate system, where , and ends at the location where and .
This view is quite convenient for visualization purposes. Let's plot the vector :
However, for our purposes a vector is a mechanism to hold data. In the example above, it turns out that values we used for the and coordinates actually represent the weight and height of a person. What we've done was using the vector to store two attributes of a person. In essence, we used the vector to store data, whereby the first element of the vector was used to store the height and the second used to store the weight. Now given another person, we could create a new vector to store their height and weight as well. Again the vector would have two elements, the first being the height and the second the weight.
Each vector has two elements, so we say that these vectors are 2-dimensional. If we chose to collect data about a person's age, we would have to add one element to each vector (to store the age of each person), so the vectors would then be three-dimensional. And if we chose to collect data about any other number of attributes, say , then the vectors would be -dimensional. In a nutshell, a vector's dimensionality is the number of elements it contains.
Now let's take this idea a little bit further. Say you have a black-and-white picture which has a resolution of 28 by 28 pixels. That image is effectively a sequence of cells (i.e. the pixels) arranged in a grid with 28 rows and 28 columns. The brightness of each pixel is given by a number between 0 and 1, with 0 being black, white being 1, and any other number in between being a shade of gray.
Let's plot the image below.
The image above is from a dataset called MNIST, which contains images of single digits and is a commonly used dataset in machine learning research. In this case, this is an image of a somewhat wonky number 2.
This image is just data (a bunch of numbers between 0 and 1), so can we store it in a vector? Indeed, we can think of that image as a vector with pixels, which means that the dimensionality of the vector is . Each element in the vector would represent the brightness of the image at a particular position (you can hover over the image above to see the values at different positions).
To arrive at the formal definition of a vector, we will first need to define fields and vector spaces.
A field is a set and two binary operations on \mathcal{F}: addition and multiplication. A binary operation on is a mapping , i.e. the operation represents an association between ordered paris of elements in and a unique element of .
An example of a field is the field of real numbers together with the addition and multiplication operations, which can be represented as . Addition and multiplication map any two elements in to a uniquely defined element in . Addition is denoted by while multiplication is denoted by or simply .
For any in , the following axioms must be satisfied:
A vector space over a field is a non-empty set together with the operations of vector addition and scalar multiplication, such that the following axioms are satisfied. For every , and in and every and in :
The elements of a vector space are called vectors. Vectors can also be subject to other operations, such as scalar and vector products, which we will cover at a later point.
To summarise, these are the key takeaways from this section:
In the next section, we will look at adding and scaling mean in practice, using an image as our example vector.