SKLearner Home | About | Contact | Examples

Scikit-Learn hamming_loss() Metric

Hamming Loss is a metric used to evaluate the performance of classification models, particularly in multilabel classification. It calculates the fraction of incorrect labels, indicating how many times the model’s predictions differ from the true labels.

The hamming_loss() function in scikit-learn computes this by dividing the number of incorrect labels by the total number of labels. It takes the true labels and predicted labels as input and returns a float value between 0 and 1, with 0 being perfect accuracy.

Hamming Loss is used primarily in multilabel classification problems. It is not suitable for single-label classification problems or when the focus is on other types of errors.

from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import hamming_loss

# Generate synthetic multilabel dataset
X, y = make_multilabel_classification(n_samples=1000, n_classes=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a RandomForest classifier with multi-label support
clf = MultiOutputClassifier(RandomForestClassifier(n_estimators=100, random_state=42))
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Calculate Hamming Loss
h_loss = hamming_loss(y_test, y_pred)
print(f"Hamming Loss: {h_loss:.2f}")

Running the example gives an output like:

Hamming Loss: 0.18


See Also