Manhattan distance, also known as L1 distance or city block distance, is a metric used to measure the distance between two points in a multi-dimensional space. It calculates the sum of the absolute differences of the coordinates of two points. Pairwise distance calculations involve computing the distances between all pairs of points in a dataset.
The manhattan_distances()
function from scikit-learn’s metrics
module allows you to calculate the pairwise Manhattan distances between data points efficiently. This function is particularly useful when working with high-dimensional data, as it is less affected by outliers compared to Euclidean distance.
Manhattan distance is commonly used in various applications, such as clustering, nearest neighbor search, and anomaly detection. However, it may not be suitable for all types of data and problems, especially when the relative magnitudes of the differences matter more than their absolute values.
from sklearn.metrics.pairwise import manhattan_distances
# Create a synthetic dataset
X = [[1, 2], [3, 4], [5, 6], [7, 8]]
# Calculate pairwise Manhattan distances
distances = manhattan_distances(X)
print("Pairwise Manhattan Distances:")
print(distances)
Running the example gives an output like:
Pairwise Manhattan Distances:
Pairwise Manhattan Distances:
[[ 0. 4. 8. 12.]
[ 4. 0. 4. 8.]
[ 8. 4. 0. 4.]
[12. 8. 4. 0.]]
The steps are as follows:
- Create a synthetic dataset
X
with a few data points. - Use the
manhattan_distances()
function to calculate the pairwise Manhattan distances between the data points inX
. - Print the resulting distance matrix.
In this example, we create a small synthetic dataset X
consisting of four data points, each with two dimensions. You can modify the dataset according to your specific needs.
Next, we use the manhattan_distances()
function to calculate the pairwise Manhattan distances between the data points. This function takes the dataset X
as input and returns a square matrix distances
, where each element distances[i, j]
represents the Manhattan distance between the i-th and j-th data points.
The resulting distance matrix is then printed, displaying the pairwise Manhattan distances between all pairs of data points in the dataset.
By examining the distance matrix, you can gain insights into the relative distances between the data points. A smaller distance indicates that two points are more similar or closer to each other in terms of their Manhattan distance.
This example demonstrates how to use the manhattan_distances()
function from scikit-learn to calculate pairwise Manhattan distances between data points, providing a simple and efficient way to measure the similarity or dissimilarity between observations in a dataset.