Scikit-Learn label_ranking_loss() Metric

Label ranking loss evaluates the average number of label pairs that are incorrectly ordered.

It measures the ranking quality in multilabel classification problems.

The metric is calculated by averaging the proportion of incorrectly ordered label pairs across all instances.

Lower values indicate better performance, with 0 representing perfect ranking. This metric is commonly used in multilabel classification tasks where the order of labels matters.

However, it has limitations, such as sensitivity to label imbalance and not considering the absolute difference in ranks.

from sklearn.datasets import make_multilabel_classification
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.metrics import label_ranking_loss

# Generate synthetic multilabel classification dataset
X, y = make_multilabel_classification(n_samples=1000, n_classes=5, random_state=42)

# Binarize labels for multilabel classification
mlb = MultiLabelBinarizer()
y_binarized = mlb.fit_transform(y)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binarized, test_size=0.2, random_state=42)

# Train an SVM classifier with a linear kernel
clf = MultiOutputClassifier(SVC(kernel='linear', probability=True, random_state=42))
clf.fit(X_train, y_train)

# Predict probabilities on test set
y_pred_prob = clf.predict(X_test)

# Calculate label ranking loss
lrl = label_ranking_loss(y_test, y_pred_prob)
print(f"Label Ranking Loss: {lrl:.2f}")

Running the example gives an output like:

Label Ranking Loss: 0.12

The steps are as follows:

Generate a synthetic multilabel classification dataset using make_multilabel_classification().
Binarize the labels using MultiLabelBinarizer().
Split the dataset into training and test sets using train_test_split().
Train an SVC classifier with a linear kernel wrapped in a MultiOutputClassifier on the training set.
Predict labels for the test set using predict().
Calculate the label ranking loss using label_ranking_loss() by comparing the true labels with the predicted labels.
Lower values of label ranking loss indicate better ranking performance, with 0 being perfect.

See Also