Scikit-Learn cohen_kappa_score() Metric

Cohen’s Kappa is a statistic that measures the agreement between two raters classifying items into mutually exclusive categories. It calculates the observed agreement between the raters and compares it to the expected agreement by chance. The resulting score ranges from -1 to 1, where 1 indicates perfect agreement, 0 represents the agreement that would be expected by chance, and negative values suggest agreement less than chance.

The cohen_kappa_score() function in scikit-learn computes Cohen’s Kappa by taking the true labels and predicted labels as input. It returns a float value representing the Kappa statistic.

Cohen’s Kappa is commonly used for assessing inter-rater reliability, such as comparing the agreement between human raters or evaluating a classifier’s predictions against ground truth labels. However, it has some limitations. Cohen’s Kappa can be sensitive to imbalanced class distributions and may produce misleading results in certain edge cases where the observed agreement is high but the expected agreement is also high due to chance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import cohen_kappa_score

# Generate synthetic dataset with 4 classes
X, y = make_classification(n_samples=1000, n_clusters_per_class=1, n_classes=4, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier
clf = SVC(kernel='rbf', C=1, random_state=42)
clf.fit(X_train, y_train)

# Predict on test set
y_pred = clf.predict(X_test)

# Calculate Cohen's Kappa
kappa = cohen_kappa_score(y_test, y_pred)
print(f"Cohen's Kappa: {kappa:.2f}")

Running the example gives an output like:

Cohen's Kappa: 0.68

The steps in this example are as follows:

We generate a synthetic multiclass classification dataset using make_classification() with 1000 samples and 4 classes.
The dataset is split into training and test sets using train_test_split(), with 80% of the data used for training and 20% for testing.
An SVM classifier (SVC) is instantiated with an RBF kernel and trained on the training data using the fit() method.
The trained classifier is used to make predictions on the test set by calling predict() on the test features.
Cohen’s Kappa is calculated between the true labels (y_test) and the predicted labels (y_pred) using the cohen_kappa_score() function.

This example demonstrates how to use the cohen_kappa_score() function from scikit-learn to evaluate the agreement between true labels and predicted labels in a multiclass classification problem.

See Also