Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!

Looking to understand the most commonly used distance metrics in machine learning? This guide will help you learn all about Euclidean, Manhattan, and Minkowski distances, and how to compute them in Python.



Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!
Image by Author

 

If you’re familiar with machine learning, you know that data points in the data set and the subsequently engineered features are all points (or vectors) in an n-dimensional space.

The distance between any two points also captures the similarity between them. Supervised learning algorithms such as K Nearest Neighbors (KNN) and clustering algorithms like K-Means Clustering use the notion of distance metrics to capture the similarity between data points. 

In clustering, the evaluated distance metric is used to group data points together. Whereas, in  KNN, this distance metric is used to find the K closest points to the given data point.

In this article, we’ll review the properties of distance metrics and then look at the most commonly used distance metrics: Euclidean, Manhattan and Minkowski. We’ll then cover how to compute them in Python using built-in functions from the scipy module.

Let’s begin!

 

Properties of Distance Metrics

 

Before we learn about the various distance metrics, let's review the properties that any distance metric in a metric space should satisfy [1]:

 

1. Symmetry 

 

If x and y are two points in a metric space, then the distance between  x and y should be equal to the distance between  y and x.

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

2. Non-negativity 

 

Distances should always be non negative. Meaning it should be greater than or equal to zero.

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

The above inequality holds with equality (d(x,y) = 0) if and only if  x and y denote the same point, i.e., x = y.

 

3. Triangle Inequality 

 

Given three points x, y, and z, the distance metric should satisfy the triangle inequality: 

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

Euclidean Distance

 

Euclidean distance is the shortest distance between any two points in a metric space. Consider two points x and y in  a two-dimensional plane with coordinates (x1, x2) and (y1, y2), respectively.

The Euclidean distance between x and y is shown:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!
Image by Author

 

This distance is given by the square root of the sum of the squared differences between the corresponding coordinates of the two points. Mathematically, the Euclidean distance between the points x and y in two-dimensional plane is given by:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

Extending to n dimensions, the points x and y are of the form x = (x1, x2, …, xn) and y = (y1, y2, …, yn), we have the following equation for Euclidean distance:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

Computing Euclidean Distance in Python

 

The Euclidean distance and the other distance metrics in that article can be computed using convenience functions from the spatial module in SciPy.

As a first step, let’s import distance from Scipy’s spatial module:

from scipy.spatial import distance

 

We then initialize two points x and y like so:

x = [3,6,9]
y = [1,0,1]

 

We can use the euclidean convenience function to find the Euclidean distance between the points x and y:

print(distance.euclidean(x,y))
Output >> 10.198039027185569

 

Manhattan Distance

 

The Manhattan distance, also called taxicab distance or cityblock distance, is another popular distance metric. Suppose you’re inside a two-dimensional plane and you can move only along the axes as shown:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!
Image by Author

 

The Manhattan distance between the points x and y is given by:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

In n-dimensional space, where each point has n coordinates, the Manhattan distance is given by:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

Though the Manhattan distance does not give the shortest distance between any two given points, it is often preferred in applications where the feature points are located in a high-dimensional space [3].

 

Computing Manhattan Distance in Python

 

We retain the import and x and y from the previous example:

from scipy.spatial import distance

x = [3,6,9]
y = [1,0,1]

 

To compute the Manhattan (or cityblock) distance, we can use the cityblock function:

print(distance.cityblock(x,y))
Output >> 16

 

Minkowski Distance

 

Named after the German mathematician, Hermann Minkowski [2], the Minkowski distance in a normed vector space is given by:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

It is pretty straightforward to see that for p = 1, the Minkowski distance equation takes the same form as that of Manhattan distance:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

Similarly, for p = 2, the Minkowski distance is equivalent to the Euclidean distance:

 

Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!X

 

Computing Minkowski Distance in Python

 

Let’s compute the Minkowski distance between the points x and y:

from scipy.spatial import distance

x = [3,6,9]
y = [1,0,1]

 

In addition to the points (arrays) between which the distance is to be calculated, the minkowski function to compute the distance also takes in the parameter p: 

print(distance.minkowski(x,y,p=3))
Output >> 9.028714870948003

 

To verify if Minkowski distance evaluates to Manhattan distance for p =1, let’s call minkowski function with p set to 1:

print(distance.minkowski(x,y,p=1))
Output >> 16.0

 

Let’s also verify that Minkowski distance for p = 2 evaluates to the Euclidean distance we computed earlier:

print(distance.minkowski(x,y,p=2))
Output >> 10.198039027185569

 

And that’s a wrap! If you’re familiar with normed vector spaces, you should be able to see similarity between the distance metrics discussed here and Lp norms. The Euclidean, Manhattan, and Minkowski distances are equivalent to the L2, L1, and Lp norms of the difference vector in a normed vector space. 

 

Summing Up

 

That's all for this tutorial. I hope you’ve now gotten the hang of the common distance metrics. As a next step, you can try playing around with the different metrics you’ve learned the next time you train machine learning algorithms. 

If you’re looking to get started with data science, check out this list of GitHub repositories to learn data science. Happy learning!

 

References and Further Reading

 

[1] Metric Spaces, Wolfram Mathworld

[2] Minkowski Distance, Wikipedia 

[3] On the Surprising Behavior of Distance Metrics in High Dimensional Space, CC Agarwal et al.

[4] SciPy Distance Functions, SciPy Docs
 
 
Bala Priya C is a technical writer who enjoys creating long-form content. Her areas of interest include math, programming, and data science. She shares her learning with the developer community by authoring tutorials, how-to guides, and more.