SKLearner Home | About | Contact | Examples

Scikit-Learn cosine_similarity() Metric

Cosine similarity is a metric used to measure the similarity between two non-zero vectors. It calculates the cosine of the angle between the vectors, with values ranging from -1 (opposite direction) to 1 (same direction). A cosine similarity of 0 indicates that the vectors are orthogonal (perpendicular).

The cosine_similarity() function from scikit-learn’s metrics.pairwise module computes the pairwise cosine similarities between a set of input vectors. It takes a 2D array-like object as input, where each row represents a vector, and returns a square matrix containing the cosine similarities between all pairs of vectors.

Cosine similarity is commonly used in text analysis and recommendation systems to compare document or item vectors. It measures the orientation of the vectors rather than their magnitude, making it insensitive to the scaling of the vector elements. This property is particularly useful when dealing with text data, where the frequency of words is more important than their absolute counts.

However, cosine similarity may not always capture semantic similarities effectively, especially when dealing with sparse, high-dimensional data. In such cases, other similarity measures or dimensionality reduction techniques might be more appropriate.

from sklearn.metrics.pairwise import cosine_similarity

# Create synthetic data
X = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Calculate cosine similarity between each pair of vectors
similarity_matrix = cosine_similarity(X)
print(similarity_matrix)

Running the example gives an output like:

[[1.         0.97463185 0.95941195]
 [0.97463185 1.         0.99819089]
 [0.95941195 0.99819089 1.        ]]

The steps in this example are:

  1. Create a synthetic dataset X consisting of three vectors, each represented as a list of numerical values.
  2. Use the cosine_similarity() function to calculate the pairwise cosine similarities between the vectors in X. The function returns a square matrix where element (i, j) represents the cosine similarity between vector i and vector j.
  3. Print the resulting similarity matrix to examine the pairwise cosine similarities between the vectors.

This example demonstrates how to use the cosine_similarity() function from scikit-learn to measure the similarity between pairs of vectors. By providing a set of input vectors, you can obtain a matrix that captures the pairwise cosine similarities, which can be useful for various applications such as document comparison, item recommendation, or clustering.



See Also