SKLearner Home | About | Contact | Examples

Scikit-Learn jaccard_score() Metric

The Jaccard similarity coefficient, or Jaccard score, is a useful metric for evaluating classification models.

It measures the similarity between predicted and true labels by calculating the intersection over union. In other words, it divides the number of correctly predicted labels by the total number of labels in either the predicted or true set.

The jaccard_score() function in scikit-learn computes this score, providing a float value between 0 and 1, with 1 indicating perfect similarity.

The Jaccard score is used for both binary and multiclass classification problems. However, it has limitations, especially with imbalanced datasets where the number of samples in each class varies significantly. In such cases, the score may not fully represent the model’s performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import jaccard_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
clf = SVC(kernel='linear', C=1, random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Calculate Jaccard score
jaccard = jaccard_score(y_test, y_pred)
print(f"Jaccard Score: {jaccard:.2f}")

Running the example gives an output like:

Jaccard Score: 0.77

The steps are as follows:

  1. Generate a synthetic binary classification dataset using make_classification().
  2. Split the dataset into training and test sets using train_test_split().
  3. Train an SVM classifier on the training set using the SVC class with a linear kernel.
  4. Predict the labels on the test set using the trained classifier’s predict() method.
  5. Calculate the Jaccard score using the jaccard_score() function, which measures the intersection over union of the predicted and true labels.


See Also