Manyspikes

To reliably identify patterns, machine learning algorithms must be able to somehow measure similarities in the data. In this module, we will look at how representing data with vectors gives us access to very interesting tools to assess similarity. We won't cover all of the different ways in which we can compute how similar two vectors are, but we will cover some of the most commonly used methods.

As before, let's start with the 2-dimensional case where we have two vectors $\mathbf{u}=[2,0]$ and $\mathbf{v}=[1,3]$ .

Now we ask the question: how similar are these vectors? The answer is that it depends on what exactly one means by similar. Do we mean that they are roughly the same length? Or do we mean that they point in the same direction? Or maybe both?

That is why there are different approaches to measuring similarity (sometimes also referred to as distance). Essentially, they differ in what attributes of the vectors they take into account, and so the right choice of similarity metric depends on the problem in hand.

Euclidean distance

Let's start with the simplest metric: the difference between the vectors $\mathbf{u}$ and $\mathbf{v}$ .

\mathbf{d} = \mathbf{u} - \mathbf{v} = \mathbf{u} + (-1 \cdot \mathbf{v})

Notice how we can express difference (i.e. subtraction) by combining addition between vectors and scalar multiplication. As we have seen in the previous module, adding two vectors and scaling a vector by a scalar both return a vector, so $\mathbf{d}$ is a vector. The larger the vector $\mathbf{d}$ , the larger the distance between the tips of the vectors $\mathbf{u}$ and $\mathbf{v}$ .

Essentially, this metric tells us how far apart the tips of the vectors are. If the vectors have the same direction and magnitude, their tips are going to be exactly in the same place, so the distance between them is zero. If vector as a completely different magnitude or direction, then their tips will be further apart and therefore the distance between them will increase.

Let's visualise the difference vector $\mathbf{d}$ together with the vectors $\mathbf{u}$ and $\mathbf{v}$ :

As before, the vectors $\mathbf{u}$ and $\mathbf{v}$ are represented in red and blue, respectively. The vector $\mathbf{d}$ is represented in solid black. We have also plotted the same vector $\mathbf{d}$ shifted so that its base sits at the tip of vector $\mathbf{v}$ . This shows that the length of $\mathbf{d}$ is indeed the distance between the tips of vectors $\mathbf{v}$ and $\mathbf{u}$ .

So we have seen that the length of $\mathbf{d}$ tells us how different the vectors are, so how do we calculate it? In this case, we can apply the Pythagorean theorem to give us the length of the vector $\mathbf{d}$ based on its coordinates $[d_x, d_y]$ .

||d|| = \sqrt{d_x^2 + d_y^2}

The quantity $||d||$ is commonly referred to as the length or magnitude of the vector $\mathbf{d}$ . In fact, this is a particular case of a more general concept called a $p$ -norm or $\mathcal{l}^p$ -norm of a vector, which for an $n$ -dimensional vector $\mathbf{x} = (x_1, x_2, \ldots, x_n)$ is defined as:

||x||_p = \left(\sum_{i=1}^{n} |x_i|^p \right)^{1/p}

We will cover norms in more detail elsewhere, namely when discussing regularisation of machine learning models. For now, remember that the $l_2$ norm effectively represents the length or magnitude of the vector.

Now that we understand how to compute the distance between 2 vectors, let's implement a function to do that.

Now let's introduce a new vector $\mathbf{t} = [0.8, 3.2]$ :

As we can see, the tip of vector $\mathbf{t}$ (green) is quite close to the tip vector v (blue), so we would expect the distance between $\mathbf{t}$ and $\mathbf{v}$ to be much smaller than the distance between $\mathbf{u}$ and $\mathbf{v}$ . Let's verify that:

Feel free to play around with different values for the coordinates of $\mathbf{t}$ to see what happens to the distance metric.

Now you may ask: how is this useful? The answer is that the distance function we have implemented above can also be used in $n$ -dimensional vectors and its interpretation remains valid. For instance, we can use it to measure similarity between pairs of MNIST images. Let's measure the distance between two instances of images of the number 5:

And now let's do the same for one of the images above and the image of a number 4:

We can see that the difference between the latter pair of images (a five and a four) is somewhat greater than the distance between the former (two fives). This might sound promising because we could perhaps use this distance to discriminate between different numbers, but remember that there are two factors affecting the euclidean distance: the direction of the vector and its magnitude. Perhaps the greater euclidean distance between the last pair of images is driven by the fact that the image of the number four has very bright pixels (i.e. larger values and as a result larger magnitude)?

We can quickly verify if that is the case by computing the euclidean norm of the vectors.

As we suspected, the $l_2$ -norm of the number 4 image is much greater than the norm of the number 5 image, so the distance between them may be significantly driven by differences in magnitude rather than the direction (i.e. the shape) of the vector images.

One potential solution to this would be to normalise the images, ensuring that their norms are equal, before computing the distances. One way to achieve this is by dividing each vector by its norm. However, as we mentioned earlier, there are other distance metrics that we can explore and some turn out to ignore differences in magnitude and focus instead on the direction of the vector. One such example is a metric called cosine similarity.

Cosine similarity

The cosine similarity $\cos(\theta)$ between two vectors $\mathbf{a}$ and $\mathbf{b}$ is defined as the ratio between the dot product between $\mathbf{a}$ and $\mathbf{b}$ and the product of the norms of $\mathbf{a}$ and $\mathbf{b}$ :

\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{||\mathbf{a}|| \ ||\mathbf{b}||}

Here we are introducing a new operation between two vectors called the dot product. The dot product $\mathbf{a} \cdot \mathbf{b}$ between two vectors is the sum of the element-wise multiplication between elements of vectors $\mathbf{a}$ and $\mathbf{b}$ , i.e.

\mathbf{a} \cdot \mathbf{b} = \sum_{i}^{n} a_i b_i

Due to the normalising denominator $||\mathbf{a}|| \ ||\mathbf{b}||$ , the cosine similarity is not affected by differences in magnitude. Instead, it captures differences in the direction of the vectors. Let's go back to the 2-dimensional plane to see this in action. We will look at two vectors $v=[1, 3]$ (blue) and $s=[0.333, 1]$ (green).

The vectors $\mathbf{u}$ and $\mathbf{t}$ have very different magnitudes, but they have virtually the same direction. While the euclidean distance would capture the difference in magnitude between the vectors, the cosine similarity would not. Let's demonstrate this below:

As we have seen in the definition of cosine similarity, the metric expresses the cosine of an angle $\theta$ : this is the angle formed between the vectors $\mathbf{a}$ and $\mathbf{b}$ . If the angle is close to zero, it means that the vectors are virtually pointing in the same direction and as a result the cosine similarity will be very close to 1. If the angle between the vectors increases, the value of the cosine similarity will decrease accordingly.

Similarity between vectors

Euclidean distance

Cosine similarity