manyspikes

Matrices and linear transformations

Initialising environment...

Linear transformations

In the previous sections, we learned about how simple operations on vectors can help us do interesting things with data. For instance, we learned how to stretch or shrink a vector, how to combine multiple vectors into one and how to assess if two vectors are similar.

However, if you think about it geometrically, there are many more things we could do with vectors. For instance, we could rotate them around a given point. But how can we achieve that?

Let's say we have a vector a=[2,1]\mathbf{a}=[2, 1] and we want to rotate it 90deg90\deg anti-clockwise to obtain a new vector a\mathbf{a'}. If we plot it out, it would look like this:

As we can see above, a rotation by 90deg90 \deg transforms the vector a=[2,1]\mathbf{a}=\left[2, 1\right] into the vector a=[1,2]\mathbf{a'}=\left[-1, 2\right]. In other words, we are swapping the x,yx,y coordinates around and flipping the yy axis.

It is clear that we cannot achieve that with simple scalar multiplication, but in this section we will learn how the basic vector operations can help us arrive at the answer once we formulate the problem in a slightly different way.

Basis vectors

When we define a vector a=[2,1]\mathbf{a}=[2, 1], we tend to think of its elements as a pair of xx and yy coordinates, respectively. However, there is a more general way to think about the coordinates of a vector, which is as a combination of a set of reference vectors. For instance, let's assume that we have two reference vectors i=[1,0]\mathbf{i}=\left[1, 0\right] and j=[0,1]\mathbf{j}=\left[0, 1\right].

We could then express the vector a\mathbf{a} in terms of vector additions and scalar multiplications of the vectors i\mathbf{i} and j\mathbf{j}:

a=2i+1j\mathbf{a} = 2\mathbf{i} + 1\mathbf{j}

Visually, it would look like this:

The basis vectors we chose are called the standard basis vectors, but we could have chosen any other pair of vectors provided they are linearly independent. The reason for this requirement is that the basis vectors of a given vector space must be able to uniquely represent any vector in that vector space. If two vectors u,v\mathbf{u}, \mathbf{v} are linearly dependent, e.g. u=[1,0]\mathbf{u}=\left[1, 0 \right] and v=[3,0]\mathbf{v}=\left[3, 0\right], then they cannot be a basis of the associated 2-dimensional vector space because they cannot represent all vectors in that space, e.g. w=[1,1]\mathbf{w}=\left[1, 1\right].

Formal definition

A set of vectors B={v1,v2,,vn}\mathcal{B} = \{ \mathbf{v_1}, \mathbf{v_2}, \ldots, \mathbf{v_n} \} is said to be a basis of a vector space V\mathcal{V} over a field F\mathcal{F} if it satisfies the following two properties:

  1. Linear Independence: The set of vectors is linearly independent, meaning that no vector in the set can be expressed as a linear combination of the others
  2. Spanning: The set of vectors spans the entire vector space V\mathcal{V}. This means that any vector in V\mathcal{V} can be expressed as a linear combination of the basis vectors:
vV,v=a1v1+a2v2++anvn\forall \mathbf{v} \in \mathcal{V}, \mathbf{v} = a_1\mathbf{v_1} + a_2\mathbf{v_2} + \ldots + a_n\mathbf{v_n}

for some scalars (a1,a2,,an)F(a_1, a_2, \ldots, a_n) \in \mathcal{F}.

Linear transformations and changes of basis vectors

The advantage of thinking about a vector as a weighted combination of its basis vectors is that we can now formulate the transformation of a vector as transformations of its basis vectors. For instance, stretching a vector by a factor (i.e. scalar multiplication) can be seen as stretching all its basis vectors by that same factor. However, we can also choose to apply different transformations to the individual basis vectors, giving us much more flexibility in the way we can transform vectors. But how do we do that? The answer is matrices.

Matrices

Matrices are grids of elements organized into rows and columns, typically represented as follows:

A=[a11a12a1na21a22a2nam1am2amn]\mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \ldots & a_{mn} \end{bmatrix}

where mm represents the number of rows, and nn represents the number of columns in the matrix. Depending on its dimensions, matrices can sometimes be called:

  • A square matrix, when m=nm=n;
  • A row vector, when m=1;
  • A column vector, when n=1.

Matrices can be subject to the following basic operations:

  • Addition: The addition of two matrices is compute element-wise. That is, if C=A+B\mathbf{C} = \mathbf{A} + \mathbf{B}, then cij=aij+bijc_{ij} = a_{ij} + b_{ij}
  • Scalar multiplication: Mutliplying a scalar by a matrix is the same as multiplying every entry of the matrix by that scalar. That is, if C=αA\mathbf{C} = \alpha\mathbf{A}, then cij=αaijc_{ij} = \alpha a_{ij}.
  • Transposition: if A\mathbf{A} is an mm-by-nn matrix, then the transpose AT\mathbf{A^T} is an nn-by-mm matrix where AijT=AjiT\mathbf{A}^T_{ij} = \mathbf{A}^T_{ji}

You can also multiply two matrices A\mathbf{A} and B\mathbf{B} (in this order) provided that the number of columns of A\mathbf{A} equals the number or rows of B\mathbf{B}. If A\mathbf{A} is an mm-by-nn matrix and B\mathbf{B} is an nn-by-pp matrix, then C=AB\mathbf{C} = \mathbf{A}\mathbf{B} will be an mm-by-pp whose elements consist of the dot product between rows of A\mathbf{A} and columns of $\mathbf{B}, i.e.

Cij=ai,b,j\mathbf{C}_{ij} = \mathbf{a_{i,\cdot}}\mathbf{b_{\cdot,j}}

where ai,\mathbf{a}_{i,\cdot} denotes the ithi^{th} row (a vector) of A\mathbf{A} and b,j\mathbf{b}_{\cdot, j} denotes the jthj^{th} column (also a vector) of B\mathbf{B}.

Back to transformations

Matrices are useful in many different contexts, but we will primarily think of them as objects that define vector transformations. In particular, we will say that matrices define how the basis vectors of a vector space are affected by the transformation, in that the ithi^{th} column of a matrix tells us what happens to the ithi^{th} basis vector under that transformation. Once we know what happens to the basis vectors, we know what happens to all the vectors in that vector space, because all vectors are expressed as a linear combination of their basis vectors.

Let's look at an example matrix A\mathbf{A}:

A=[1001]\mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix}

This matrix encodes a transformation that moves the first basis vector to the coordinates [1,0][1, 0] (first column of the matrix A\mathbf{A}) and the second basis vector to [0,1][0, 1] (the second column of A\mathbf{A}). But those vectors are the same as standard basis vectors, so this transformation does nothing at all. Because the transformation does nothing at all, the matrix is often referred to as the identity matrix (we will cover this in more detail later). Let's look at another example.

A=[2002]\mathbf{A} = \begin{bmatrix} 2 & 0 \\ 0 & 2 \\ \end{bmatrix}

This example moves the first basis vector to the coordinates [2,0][2, 0] and the second basis vector to [0,2][0, 2]. This means we stretched both basis vectors by a factor of two. Thus, this transformation scales a given vector by a factor of 2 in both dimensions. Now let's say I would like to stretch the first basis vector by a factor of 3, but not the second. In this case, the corresponding transformation would be:

A=[2001]\mathbf{A} = \begin{bmatrix} 2 & 0 \\ 0 & 1 \\ \end{bmatrix}

In the example above, our goal was to identify a transformation that would rotate the vector a\mathbf{a} by 90 degrees. To do that, we established that we needed to swap the (x,y)(x, y) coordinates and further flip the yy coordinate. Thus, we need:

  • the first basis vector to land where the second basis vector was, i.e. [0,1][0, 1]
  • the second basis vector to land where the first basis vector was but flipped around the origin, i.e. [1,0][-1, 0].

With this information, we build the following transformation matrix:

A=[0110]\mathbf{A} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ \end{bmatrix}

Once we have a transformation matrix AA, we can transform any vector vV\mathbf{v} \in V by multiplying that matrix by the vector v\mathbf{v}:

v=Av\mathbf{v}' = \mathbf{A}\mathbf{v}

Let's now verify that this works by implementing this in code. First, we will implement a function to compute the dot product between two vectors. Next, we implement a function to compute matrix multiplications—in it we will make use of the function that computes the dot product, since the element vij\mathbf{v}_{ij} in the resulting matrix is the dot product between the ithi^{th} row of A\mathbf{A} and the jthj^{th} column of v\mathbf{v}

Finally, we test that indeed multiplying A\mathbf{A} by v\mathbf{v} produces the expected result, i.e. the vector [1,2]\left[-1, 2\right]

Common linear transformations

Above we looked at a rotation as an example of a transformation represented as a matrix. There are other common types of transformations that can be represented as matrices, such as shearing, scaling and reflection.

However, all of these transformations are considered to be linear. By definition, a transformation ff is linear if it respects the following two rules:

  • Additivity:  u,vV\forall \ \mathbf{u},\mathbf{v} \in \mathcal{V}, f(u+v)=f(u)+f(v)f(\mathbf{u} + \mathbf{v}) = f(\mathbf{u}) + f(\mathbf{v})
  • Homogeneity:  cF,uV\forall \ c \in \mathcal{F}, \mathbf{u} \in \mathcal{V}, f(cu)=cf(u)f(c\mathbf{u}) = cf(\mathbf{u})

Summary

In this section, we learned how matrices can be used to represent different linear transformations. As we will see in future modules, these transformations are key components of modern machine learning models. They can be combined with non-linear operations to solve highly complex tasks like object recognition or image generation.