In the previous sections, we learned about how simple operations on vectors can help us do interesting things with data. For instance, we learned how to stretch or shrink a vector, how to combine multiple vectors into one and how to assess if two vectors are similar.
However, if you think about it geometrically, there are many more things we could do with vectors. For instance, we could rotate them around a given point. But how can we achieve that?
Let's say we have a vector and we want to rotate it anti-clockwise to obtain a new vector . If we plot it out, it would look like this:
As we can see above, a rotation by transforms the vector into the vector . In other words, we are swapping the coordinates around and flipping the axis.
It is clear that we cannot achieve that with simple scalar multiplication, but in this section we will learn how the basic vector operations can help us arrive at the answer once we formulate the problem in a slightly different way.
When we define a vector , we tend to think of its elements as a pair of and coordinates, respectively. However, there is a more general way to think about the coordinates of a vector, which is as a combination of a set of reference vectors. For instance, let's assume that we have two reference vectors and .
We could then express the vector in terms of vector additions and scalar multiplications of the vectors and :
Visually, it would look like this:
The basis vectors we chose are called the standard basis vectors, but we could have chosen any other pair of vectors provided they are linearly independent. The reason for this requirement is that the basis vectors of a given vector space must be able to uniquely represent any vector in that vector space. If two vectors are linearly dependent, e.g. and , then they cannot be a basis of the associated 2-dimensional vector space because they cannot represent all vectors in that space, e.g. .
A set of vectors is said to be a basis of a vector space over a field if it satisfies the following two properties:
for some scalars .
The advantage of thinking about a vector as a weighted combination of its basis vectors is that we can now formulate the transformation of a vector as transformations of its basis vectors. For instance, stretching a vector by a factor (i.e. scalar multiplication) can be seen as stretching all its basis vectors by that same factor. However, we can also choose to apply different transformations to the individual basis vectors, giving us much more flexibility in the way we can transform vectors. But how do we do that? The answer is matrices.
Matrices are grids of elements organized into rows and columns, typically represented as follows:
where represents the number of rows, and represents the number of columns in the matrix. Depending on its dimensions, matrices can sometimes be called:
Matrices can be subject to the following basic operations:
You can also multiply two matrices and (in this order) provided that the number of columns of equals the number or rows of . If is an -by- matrix and is an -by- matrix, then will be an -by- whose elements consist of the dot product between rows of and columns of $\mathbf{B}, i.e.
where denotes the row (a vector) of and denotes the column (also a vector) of .
Matrices are useful in many different contexts, but we will primarily think of them as objects that define vector transformations. In particular, we will say that matrices define how the basis vectors of a vector space are affected by the transformation, in that the column of a matrix tells us what happens to the basis vector under that transformation. Once we know what happens to the basis vectors, we know what happens to all the vectors in that vector space, because all vectors are expressed as a linear combination of their basis vectors.
Let's look at an example matrix :
This matrix encodes a transformation that moves the first basis vector to the coordinates (first column of the matrix ) and the second basis vector to (the second column of ). But those vectors are the same as standard basis vectors, so this transformation does nothing at all. Because the transformation does nothing at all, the matrix is often referred to as the identity matrix (we will cover this in more detail later). Let's look at another example.
This example moves the first basis vector to the coordinates and the second basis vector to . This means we stretched both basis vectors by a factor of two. Thus, this transformation scales a given vector by a factor of 2 in both dimensions. Now let's say I would like to stretch the first basis vector by a factor of 3, but not the second. In this case, the corresponding transformation would be:
In the example above, our goal was to identify a transformation that would rotate the vector by 90 degrees. To do that, we established that we needed to swap the coordinates and further flip the coordinate. Thus, we need:
With this information, we build the following transformation matrix:
Once we have a transformation matrix , we can transform any vector by multiplying that matrix by the vector :
Let's now verify that this works by implementing this in code. First, we will implement a function to compute the dot product between two vectors. Next, we implement a function to compute matrix multiplications—in it we will make use of the function that computes the dot product, since the element in the resulting matrix is the dot product between the row of and the column of
Finally, we test that indeed multiplying by produces the expected result, i.e. the vector
Above we looked at a rotation as an example of a transformation represented as a matrix. There are other common types of transformations that can be represented as matrices, such as shearing, scaling and reflection.
However, all of these transformations are considered to be linear. By definition, a transformation is linear if it respects the following two rules:
In this section, we learned how matrices can be used to represent different linear transformations. As we will see in future modules, these transformations are key components of modern machine learning models. They can be combined with non-linear operations to solve highly complex tasks like object recognition or image generation.