# Distance Metrics: Euclidean, Manhattan, Minkowski, Oh My!

Looking to understand the most commonly used distance metrics in machine learning? This guide will help you learn all about Euclidean, Manhattan, and Minkowski distances, and how to compute them in Python. Image by Author

If you’re familiar with machine learning, you know that data points in the data set and the subsequently engineered features are all points (or vectors) in an n-dimensional space.

The distance between any two points also captures the similarity between them. Supervised learning algorithms such as K Nearest Neighbors (KNN) and clustering algorithms like K-Means Clustering use the notion of distance metrics to capture the similarity between data points.

In clustering, the evaluated distance metric is used to group data points together. Whereas, in  KNN, this distance metric is used to find the K closest points to the given data point.

In this article, we’ll review the properties of distance metrics and then look at the most commonly used distance metrics: Euclidean, Manhattan and Minkowski. We’ll then cover how to compute them in Python using built-in functions from the scipy module.

Let’s begin!

# Properties of Distance Metrics

Before we learn about the various distance metrics, let's review the properties that any distance metric in a metric space should satisfy :

## 1. Symmetry

If x and y are two points in a metric space, then the distance between  x and y should be equal to the distance between  y and x. ## 2. Non-negativity

Distances should always be non negative. Meaning it should be greater than or equal to zero. The above inequality holds with equality (d(x,y) = 0) if and only if  x and y denote the same point, i.e., x = y.

## 3. Triangle Inequality

Given three points x, y, and z, the distance metric should satisfy the triangle inequality: # Euclidean Distance

Euclidean distance is the shortest distance between any two points in a metric space. Consider two points x and y in  a two-dimensional plane with coordinates (x1, x2) and (y1, y2), respectively.

The Euclidean distance between x and y is shown: Image by Author

This distance is given by the square root of the sum of the squared differences between the corresponding coordinates of the two points. Mathematically, the Euclidean distance between the points x and y in two-dimensional plane is given by: Extending to n dimensions, the points x and y are of the form x = (x1, x2, …, xn) and y = (y1, y2, …, yn), we have the following equation for Euclidean distance: ## Computing Euclidean Distance in Python

The Euclidean distance and the other distance metrics in that article can be computed using convenience functions from the spatial module in SciPy.

As a first step, let’s import `distance` from Scipy’s `spatial` module:

``from scipy.spatial import distance``

We then initialize two points x and y like so:

``````x = [3,6,9]
y = [1,0,1]``````

We can use the `euclidean` convenience function to find the Euclidean distance between the points x and y:

``````print(distance.euclidean(x,y))
Output >> 10.198039027185569``````

# Manhattan Distance

The Manhattan distance, also called taxicab distance or cityblock distance, is another popular distance metric. Suppose you’re inside a two-dimensional plane and you can move only along the axes as shown: Image by Author

The Manhattan distance between the points x and y is given by: In n-dimensional space, where each point has n coordinates, the Manhattan distance is given by: Though the Manhattan distance does not give the shortest distance between any two given points, it is often preferred in applications where the feature points are located in a high-dimensional space .

## Computing Manhattan Distance in Python

We retain the import and x and y from the previous example:

``````from scipy.spatial import distance

x = [3,6,9]
y = [1,0,1]``````

To compute the Manhattan (or cityblock) distance, we can use the `cityblock` function:

``````print(distance.cityblock(x,y))
Output >> 16``````

# Minkowski Distance

Named after the German mathematician, Hermann Minkowski , the Minkowski distance in a normed vector space is given by: It is pretty straightforward to see that for p = 1, the Minkowski distance equation takes the same form as that of Manhattan distance: Similarly, for p = 2, the Minkowski distance is equivalent to the Euclidean distance: ## Computing Minkowski Distance in Python

Let’s compute the Minkowski distance between the points x and y:

``````from scipy.spatial import distance

x = [3,6,9]
y = [1,0,1]``````

In addition to the points (arrays) between which the distance is to be calculated, the `minkowski` function to compute the distance also takes in the parameter `p`

``````print(distance.minkowski(x,y,p=3))
Output >> 9.028714870948003``````

To verify if Minkowski distance evaluates to Manhattan distance for p =1, let’s call ` minkowski` function with `p` set to 1:

``````print(distance.minkowski(x,y,p=1))
Output >> 16.0``````

Let’s also verify that Minkowski distance for p = 2 evaluates to the Euclidean distance we computed earlier:

``````print(distance.minkowski(x,y,p=2))
Output >> 10.198039027185569``````

And that’s a wrap! If you’re familiar with normed vector spaces, you should be able to see similarity between the distance metrics discussed here and Lp norms. The Euclidean, Manhattan, and Minkowski distances are equivalent to the L2, L1, and Lp norms of the difference vector in a normed vector space.

# Summing Up

That's all for this tutorial. I hope you’ve now gotten the hang of the common distance metrics. As a next step, you can try playing around with the different metrics you’ve learned the next time you train machine learning algorithms.

If you’re looking to get started with data science, check out this list of GitHub repositories to learn data science. Happy learning!

 Metric Spaces, Wolfram Mathworld

 Minkowski Distance, Wikipedia

 On the Surprising Behavior of Distance Metrics in High Dimensional Space, CC Agarwal et al.

 SciPy Distance Functions, SciPy Docs

Bala Priya C is a technical writer who enjoys creating long-form content. Her areas of interest include math, programming, and data science. She shares her learning with the developer community by authoring tutorials, how-to guides, and more. Get the FREE ebook 'The Great Big Natural Language Processing Primer' and 'The Complete Collection of Data Science Cheat Sheets' along with the leading newsletter on Data Science, Machine Learning, AI & Analytics straight to your inbox.  