The haversine_distances()
function in scikit-learn is used to calculate pairwise distances between points on a sphere, such as Earth. It is particularly useful for geospatial analysis or clustering geographic data points.
The function takes latitudes and longitudes as input and returns a distance matrix containing the pairwise distances between the points. Internally, it uses the haversine formula to calculate the great-circle distances, which assumes a spherical Earth model.
Typical use cases for haversine_distances()
include finding nearby locations, clustering geographic data, or calculating distances between cities or landmarks. However, it’s important to note that the function assumes a spherical Earth, which may introduce slight inaccuracies compared to more advanced geodetic models.
from sklearn.metrics.pairwise import haversine_distances
import numpy as np
# Create synthetic latitude and longitude data
latitudes = np.array([40.7128, 51.5074, 35.6895, 48.8566])
longitudes = np.array([-74.0060, -0.1278, 139.6917, 2.3522])
# Calculate pairwise distances using haversine_distances()
distances = haversine_distances(np.radians(np.column_stack([latitudes, longitudes])))
print("Pairwise distances:")
print(distances)
Pairwise distances:
[[0. 0.87430893 1.70284225 0.91622052]
[0.87430893 0. 1.50034746 0.05392498]
[1.70284225 1.50034746 0. 1.52441864]
[0.91622052 0.05392498 1.52441864 0. ]]
The steps involved in calculating pairwise distances using haversine_distances()
are as follows:
- Generate synthetic latitude and longitude data for a few points using
np.array()
. - Convert the latitude and longitude data from degrees to radians using
np.radians()
, as required by the haversine formula. - Stack the latitude and longitude arrays into a single 2D array using
np.column_stack()
to create the input format expected byhaversine_distances()
. - Calculate the pairwise distances between the points by calling
haversine_distances()
with the prepared input array. - Print the resulting distance matrix, which contains the pairwise distances between all points in the input data.
This example demonstrates how to use the haversine_distances()
function from scikit-learn to calculate pairwise distances between geographic points defined by their latitudes and longitudes. By providing a small synthetic dataset of coordinates, it shows the steps required to prepare the input data and obtain the distance matrix using the haversine formula.