manyspikes

Eigenvalues and eigenvectors

Initialising environment...

We previously saw how the absolute value of the determinant of a matrix can be very helpful because it tells us to what extent space is stretched or compressed by a linear transformation. However, the determinant does not give us any information about the extent to which space gets stretched in different directions. Let's take a look at the following three matrices:

A=[1001],B=[2000.5],C=[0.5002]\begin{equation} \mathbf{A} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} , \mathbf{B} = \begin{bmatrix} 2 & 0 \\ 0 & 0.5 \end{bmatrix} , \mathbf{C} = \begin{bmatrix} 0.5 & 0 \\ 0 & 2 \end{bmatrix} \end{equation}

The determinants of A\mathbf{A}, B\mathbf{B} and C\mathbf{C} are all equal to 1, so one might naively think that these linear transformations are not stretching or shrinking space. However, these transformations are affecting space in very different ways:

  1. The matrix A\mathbf{A} is doing nothing. It is the identity matrix, so it leaves its inputs exactly where they were.
  2. The matrix B\mathbf{B} stretches the first basis vector by a factor of 2 and compresses the second by the same amount.
  3. The matrix C\mathbf{C} compresses the first basis vector by half and stretches the second by a factor of 2.

These linear transformations clearly produce very different results. Let's visualize what would happen to a regular grid of points under the transformations A\mathbf{A}, B\mathbf{B} and C\mathbf{C}. First, let's plot a grid of evenly spaced points:

Now let's plot the transformed points ya=Ax\mathbf{y_a} = \mathbf{Ax}, yb=Bx\mathbf{y_b} = \mathbf{Bx} and yc=Cx\mathbf{y_c} = \mathbf{Cx}:

As expected, these transformations stretch and compress space in very different ways, even though their determinants are all equal.

In this section, we will introduce the concepts of eigenvectors and eigenvalues: together, eigenvectors and eigenvalues tell us exactly in what directions and to which extent space is being stretched or compressed. Thus, they give us a much more detailed picture about what a linear transformation does to its inptus.

Eigenvalues and eigenvectors

In the context of linear transformations, eigenvectors tell us the directions in which space is stretched or compressed. An eigenvector always has an eigenvalue associated with it, which tells us the extent to which space is stretched or compressed in the direction of its eigenvector.

The above implies that eigenvectors of a linear transformation are simply scaled by that transformation—since the transformation stretches or compresses space along the direction of the eigenvectors, the eigenvectors themselves would simply be made longer or shorter, but their directions would remain unaffected by the transformation. Thus, for a linear transformation A\mathbf{A} we can write:

Av=λv\begin{equation} \mathbf{Av} = \lambda\mathbf{v} \end{equation}

where v\mathbf{v} is an eigenvector of A\mathbf{A} and λ\lambda is its corresponding eigenvalue. Finding eigenvalues relies on finding the values of v\mathbf{v} and λ\lambda for which the equality holds true. Rearranging this equation, we can write

Avλv=0\begin{equation} \mathbf{Av} - \lambda\mathbf{v} = 0 \end{equation}

To be able to factor out v\mathbf{v}, we need to express the scalar mutliplication λv\lambda\mathbf{v} as a matrix mutliplication using the identity matrix:

(AλI)v=0\begin{equation} (\mathbf{A} - \lambda\mathbf{I})\mathbf{v} = 0 \end{equation}

What the equation above says is that we now have a new linear transformation, AvλI\mathbf{Av} - \lambda\mathbf{I}, that maps a vector v\mathbf{v} onto the zero vector. This means that this transformation must be mapping some dimension to zero, which in turn means that its determinant must be zero.

det(AλI)=0\begin{equation} \det(\mathbf{A} - \lambda\mathbf{I}) = 0 \end{equation}

The matrix AvλI\mathbf{Av} - \lambda\mathbf{I} is just the matrix A\mathbf{A} with the value λ\lambda subtracted along the diagonal. For 2-by-2 matrix, this would be:

AλI=[a11λa12a21a22λ]\begin{equation} \mathbf{A} - \lambda\mathbf{I} = \begin{bmatrix} a_{11} - \lambda & a_{12} \\ a_{21} & a_{22} - \lambda \end{bmatrix} \end{equation}

For small enough matrices, we can solve this analytically by plugging in the formula for the determinant. Sticking to a 2-by-2 matrix for demonstration purposes, we would get the following equality

(a11λ)(a22λ)a21a12=0\begin{equation} (a_{11} - \lambda)(a_{22} - \lambda) - a_{21}a_{12} = 0 \end{equation}

In general, equation (7) can have no solution, one solution or multiple solutions. To help making this more concrete, let's look at a few example transformations.

Example 1

Consider the matrix

A=[0110]\begin{equation} \mathbf{A} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} \end{equation}

This matrix rotates the input by 90 degrees counterclockwise. Substituting the values in (7), we get

det(AλI)=(0λ)(0λ)(1)(1)=0    λ2=1\begin{equation} \det(\mathbf{A} - \lambda\mathbf{I}) = (0 - \lambda)(0 - \lambda) - (1)(-1) = 0 \implies \lambda^2 = -1 \end{equation}

This equality has no solution (for real numbers, that is). Thus, the rotation matrix above has no real-valued eigenvectors or eigenvalues. This makes sense because a 90 degree rotation around the origin affects the direction of every vector in the 2D space. Thus, there are no vectors that preserve their direction under this transformation.

Example 2

Now let's consider the matrix:

A=[2412]\begin{equation} \mathbf{A} = \begin{bmatrix} 2 & 4 \\ 1 & 2 \end{bmatrix} \end{equation}

Plugging in the values into (7), we get

det(AλI)=(2λ)(2λ)(1)(4)=0    λ24λ=λ(λ4)=0\begin{equation} \det(\mathbf{A} - \lambda\mathbf{I}) = (2 - \lambda)(2 - \lambda) - (1)(4) = 0 \implies \lambda^2 - 4\lambda = \lambda(\lambda - 4) = 0 \end{equation}

The above equation has two solutions: λ=0\lambda=0 and λ=4\lambda=4. Interestingly, one of the eigenvalues is thus zero—we will come back to this is a moment. For the non-zero eigenvector, let's plug it into equation (4) so that we can calculate the corresponding eigenvector:

[244124][v1v2]=[2412][v1v2]=[00]\begin{equation} \begin{bmatrix} 2 - 4 & 4 \\ 1 & 2 - 4 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} -2 & 4 \\ 1 & -2 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation}

By the rules of matrix multiplication, we get

2v1+4v2=4v1    v1=2v2v1+2v2=4v2    v1=2v2\begin{align} 2v_1 + 4v_2 & = 4v_1 \implies v_1=2v_2 \\ v_1 + 2v_2 & = 4v_2 \implies v_1=2v_2 \end{align}

Thus, any vector for which v1=2v2v_1=2v_2 statisfies the equality and is therefore an eigenvector.

Now let's look at the zero eigenvalue. Applying the same logic:

[204120][v1v2]=[2412][v1v2]=[00]\begin{equation} \begin{bmatrix} 2 - 0 & 4 \\ 1 & 2 - 0 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 2 & 4 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation}

and thus we get

2v1+4v2=0    v1=2v2v1+2v2=0    v1=2v2\begin{align} 2v_1 + 4v_2 & = 0 \implies v_1=-2v_2 \\ v_1 + 2v_2 & = 0 \implies v_1=-2v_2 \end{align}

This eigenvector is particularly interesting because its eigenvalue is zero, which means that space gets squished to nothing along this direction. This is of course not a coincidence: the transformation matrix is rank-deficient, so it maps the 2-d input space to a 1-dimensional space. As a result, one of its eigenvalues must be zero, representing the squishing of a dimension along the corresponding eigenvector. This is why the number of non-zero eigenvalues of a matrix corresponds to its rank.

Example 3

Let's look at one final matrix:

A=[2002]\begin{equation} \mathbf{A} = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix} \end{equation}

Let's start by computing its eigenvalues according to (7):

det(AλI)=(2λ)(2λ)=0    λ=2\begin{equation} \det(\mathbf{A} - \lambda\mathbf{I}) = (2 - \lambda)(2 - \lambda) = 0 \implies \lambda = 2 \end{equation}

The above equation a single solution: λ=2\lambda=2. Plugging it into (4) we get

[220022][v1v2]=[0000][v1v2]=[00]\begin{equation} \begin{bmatrix} 2 - 2 & 0 \\ 0 & 2 - 2 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \end{bmatrix} \end{equation}

The above is true regardless of the values of v1v_1 and v2v_2. Thus, every vector is a solution to this equation, which means that every vector is an eigenvector of this transformation. This makes sense when we think about this transformation geometrically: the transformation is scaling both dimensions equally, so all the vectors preserve their direction under this transformation.

Decomposing a transformation

We have seen that eigenvalues represent the extent to which a transformation stretches or compresses space in the direction of its eigenvectors. Intuitively, this indicates that a linear transformation can be thought of as three separate steps:

  1. Transform the input into the space where the basis vectors are the eigenvectors of the transformation.
  2. In the transformed space, scale each basis vector according to the corresponding eigenvalue.
  3. Apply the inverse of the transformation applied in 1, so that we move back to the original basis.

Formally, we can write this down as:

A=PDP1\begin{equation} \mathbf{A} = \mathbf{P} \mathbf{D} \mathbf{P^{-1}} \end{equation}

where P\mathbf{P} is a matrix whose column vectors are the eigenvectors of A\mathbf{A} and D\mathbf{D} is a diagonal matrix populated with the corresponding eigenvalues along the diagonal. This makes sense, but note that we are making an implicit assumption that the matrix P\mathbf{P} is invertible, i.e. that the eigenvectors are linearly independent or, in other words, that P\mathbf{P} has full rank. If a matrix can be decomposed in this manner, it is called diagonalizable.

Diagonalising a matrix is quite convenient for a variety of reasons. For instance, because the transformation P\mathbf{P} is reversed by its inverse, the extent to which A\mathbf{A} scales space depends only on D\mathbf{D}. In turn, D\mathbf{D} is a diagonal matrix with the eigenvalues of A\mathbf{A} along the diagonal, so it is easy to see that it scales space by the product of all the eigenvalues. Thus, for diagonalizable matrices, the determinant is given by the product the diagonal entries of D\mathbf{D}.