# Similarity Metrics in NLP

This post covers the use of euclidean distance, dot product, and cosine similarity as NLP similarity metrics.

**By James Briggs, Data Scientist**

Image by the author

When we convert language into a machine-readable format, theÂ *standard*Â approach is to use dense vectors.

A neural network typically generates dense vectors. They allow us to convert words and sentences into high-dimensional vectors â€” organized so that each vector's geometric position can attribute meaning.

The well-known language arithmetic example showing thatÂ

**Queen = King â€” Man + Woman**

There is a particularly well-known example of this, where we take the vector ofÂ *King*, subtract the vectorÂ *Man*, and add the vectorÂ *Woman*. The closest matching vector to the resultant vector isÂ *Queen*.

We can apply the sameÂ logic to longer sequences, too, like sentences or paragraphs â€” and we will find that similar meaning corresponds with proximity/orientation between those vectors.

So, similarity is important â€” and what we will cover here are the three most popular metrics for calculating that similarity.

### Euclidean Distance

Euclidean distance (often called L2 norm) is the most intuitive of the metrics. Letâ€™s define three vectors:

Three vector examples

Just by looking at these vectors, we can confidently say thatÂ **a**Â andÂ **b**Â are nearer to each other â€” and we see this even clearer when visualizing each on a chart:

VectorsÂ

**a**Â andÂ

**b**Â are close to the origin, vectorÂ

**c**Â is much more distant

Clearly,Â **a**Â andÂ **b**Â are closer together â€” and we calculate that using Euclidean distance:

Euclidean distance formula

To apply this formula to our two vectors,Â **a**Â andÂ **b,**Â we do:

Calculation of Euclidean distance between vectorsÂ

**a**Â andÂ

**b**

And we get a distance ofÂ **0.014**, performing the same calculation forÂ **d(a, c)**Â returnsÂ **1.145**, andÂ **d(b, c)**Â returnsÂ **1.136**. Clearly,Â **a**Â andÂ **b**Â are nearer in Euclidean space.

### Dot Product

One drawback of Euclidean distance is the lack of orientation considered in the calculation â€” it is based solely on magnitude. And this is where we can use our other two metrics. The first of those is the dot product.

The dot product considers direction (orientation) and also scales with vector magnitude.

We care about orientation because similar meaning (as we will often find) can be represented by the direction of the vector â€” not necessarily the magnitude of it.

For example, we may find that our vector's magnitude correlates with the frequency of a word that it represents in our dataset. Now, the wordÂ **hi**Â means the same asÂ **hello**, and this may not be represented if our training data contained the wordÂ **hi**Â 1000 times andÂ **hello**Â just twice.

So, vectors' orientation is often seen as being just as important (if not more so) as distance.

The dot product is calculated using:

Dot product formula

The dot product considers the angle between vectors, where the angle is ~0, theÂ **cosÎ¸Â **component of the formula equals ~1. If the angle is nearer to 180 (orthogonal/perpendicular), theÂ **cosÎ¸Â **component equals ~0.

Therefore, theÂ **cosÎ¸**Â component increases the result where there is less of an angle between the two vectors. So, a higher dot-product correlates with higher orientation.

Again, letâ€™s apply this formula to our two vectors,Â **a**Â andÂ **b**:

Calculation of dot product for vectorsÂ

**a**Â andÂ

**b**

Clearly, the dot product calculation is straightforward (the simplest of the three) â€” and this gives us benefits in terms of computation time.

However, there is one drawback. It is not normalized â€” meaning larger vectors will tend to score higher dot products, despite being less similar.

For example, if we calculateÂ **aÂ·a**Â â€” we would expect a higher score thanÂ **aÂ·cÂ **(**a**Â is an exact match toÂ **a**). But thatâ€™s not how it works, unfortunately.

The dot product isnâ€™t so great when our vectors have differing magnitudes.

So, in reality, the dot-product is used to identify the general orientation of two vectors â€” because:

- Two vectors that point in a similar direction return aÂ
**positiveÂ**dot-product. - Two perpendicular vectors return a dot-product ofÂ
**zero**. - Vectors that point in opposing directions return aÂ
**negativeÂ**dot-product.

### Cosine Similarity

Cosine similarity considers vector orientation, independent of vector magnitude.

Cosine similarity formula

The first thing we should be aware of in this formula is that the numerator is, in fact, the dot product â€” which considers bothÂ *magnitude*Â andÂ *direction*.

In the denominator, we have the strange double vertical bars â€” these meanÂ *â€˜the length ofâ€™*. So, we have the length ofÂ **u**Â multiplied by the length ofÂ **v**. The length, of course, considersÂ *magnitude*.

When we take a function that considers bothÂ *magnitudeÂ *andÂ *direction*Â and divide that by a function that considers justÂ *magnitude*Â â€” those twoÂ *magnitudes*Â cancel out, leaving us with a function that considersÂ *direction*Â **independent of magnitude**.

We can think of cosine similarity as aÂ *normalized*Â dot product! And it clearly works. The cosine similarity ofÂ **a**Â andÂ **b**Â is nearÂ **1**Â (perfect):

Calculation of cosine similarity for vectorsÂ

**a**Â andÂ

**b**

And using theÂ `sklearn`

Â implementation of cosine similarity to compareÂ **a**Â andÂ **c**Â again gives us much better results:

Cosine similarity can often provide much better results than the dot product.

Thatâ€™s all for this article covering the three distance/similarity metrics â€” Euclidean distance, dot product, and cosine similarity.

Itâ€™s worth being aware of how each works and their pros and cons â€” as theyâ€™re all used heavily in machine learning, and particularly NLP.

You can find Python implementations of each metric inÂ this notebook.

I hope youâ€™ve enjoyed the article. Let me know if you have any questions or suggestions viaÂ TwitterÂ or in the comments below. If youâ€™re interested in more content like this, I post onÂ YouTubeÂ too.

Thanks for reading!

**All images are by the author except where stated otherwise*

**Bio: James Briggs** is a data scientist specializing in natural language processing and working in the finance sector, based in London, UK. He is also a freelance mentor, writer, and content creator. You can reach the author via email (jamescalam94@gmail.com).

Original. Reposted with permission.

**Related:**

- How to Apply Transformers to Any Length of Text
- Simple Question Answering (QA) Systems That Use Text Similarity Detection in Python
- 4 Machine Learning Concepts I Wish I Knew When I Built My First Model