Scikit-Learn contingency_matrix() Metric

Evaluating the performance of a classification model involves understanding the distribution of predicted and true labels. The contingency_matrix() function from scikit-learn helps in creating a matrix that shows the counts of true vs. predicted labels.

The contingency_matrix() function creates a matrix where rows represent true labels and columns represent predicted labels. Each cell shows the count of instances for the corresponding true-predicted label pair. The function compares each true label with the predicted label and increments the corresponding cell in the matrix.

A diagonal matrix indicates perfect classification, while off-diagonal values indicate misclassifications. This metric is useful for both binary and multiclass classification problems. However, it has limitations, such as being less insightful for highly imbalanced datasets and not accounting for the severity of misclassifications.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics.cluster import contingency_matrix

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_clusters_per_class=1, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
clf = SVC(kernel='linear', C=1, random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Calculate contingency matrix
cm = contingency_matrix(y_test, y_pred)
print(f"Contingency Matrix:\n{cm}")

Running the example gives an output like:

Contingency Matrix:
[[60  2  4]
 [ 9 51  1]
 [ 5  0 68]]

The steps are as follows:

Generate a synthetic multiclass classification dataset using make_classification().
Split the dataset into training and test sets using train_test_split().
Train an SVC classifier with a linear kernel.
Predict labels on the test set using the trained classifier.
Calculate the contingency matrix using contingency_matrix() by comparing true and predicted labels.

This example demonstrates how to use contingency_matrix() from scikit-learn to evaluate a classification model by visualizing the distribution of true versus predicted labels.

See Also