SKLearner Home | About | Contact | Examples

Scikit-Learn haversine_distances() Metric

The haversine_distances() function in scikit-learn is used to calculate pairwise distances between points on a sphere, such as Earth. It is particularly useful for geospatial analysis or clustering geographic data points.

The function takes latitudes and longitudes as input and returns a distance matrix containing the pairwise distances between the points. Internally, it uses the haversine formula to calculate the great-circle distances, which assumes a spherical Earth model.

Typical use cases for haversine_distances() include finding nearby locations, clustering geographic data, or calculating distances between cities or landmarks. However, it’s important to note that the function assumes a spherical Earth, which may introduce slight inaccuracies compared to more advanced geodetic models.

from sklearn.metrics.pairwise import haversine_distances
import numpy as np

# Create synthetic latitude and longitude data
latitudes = np.array([40.7128, 51.5074, 35.6895, 48.8566])
longitudes = np.array([-74.0060, -0.1278, 139.6917, 2.3522])

# Calculate pairwise distances using haversine_distances()
distances = haversine_distances(np.radians(np.column_stack([latitudes, longitudes])))

print("Pairwise distances:")
print(distances)
Pairwise distances:
[[0.         0.87430893 1.70284225 0.91622052]
 [0.87430893 0.         1.50034746 0.05392498]
 [1.70284225 1.50034746 0.         1.52441864]
 [0.91622052 0.05392498 1.52441864 0.        ]]

The steps involved in calculating pairwise distances using haversine_distances() are as follows:

  1. Generate synthetic latitude and longitude data for a few points using np.array().
  2. Convert the latitude and longitude data from degrees to radians using np.radians(), as required by the haversine formula.
  3. Stack the latitude and longitude arrays into a single 2D array using np.column_stack() to create the input format expected by haversine_distances().
  4. Calculate the pairwise distances between the points by calling haversine_distances() with the prepared input array.
  5. Print the resulting distance matrix, which contains the pairwise distances between all points in the input data.

This example demonstrates how to use the haversine_distances() function from scikit-learn to calculate pairwise distances between geographic points defined by their latitudes and longitudes. By providing a small synthetic dataset of coordinates, it shows the steps required to prepare the input data and obtain the distance matrix using the haversine formula.



See Also