Discounted Cumulative Gain (DCG) is a metric used to evaluate the quality of a ranking. It measures the relevance of items based on their positions in a list, with higher-ranked items contributing more to the score. This is particularly useful in search engines and recommendation systems.
The dcg_score()
function in scikit-learn calculates DCG by summing the relevance scores of results, discounted logarithmically based on their position in the ranking. It takes the true relevance scores and predicted relevance scores as input and returns a float value representing the DCG score.
DCG is commonly used in ranking tasks, making it ideal for evaluating algorithms that order items, such as recommendations. However, it is less suitable for direct evaluation of binary or multiclass classification models.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import dcg_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train an SVM classifier
clf = SVC(kernel='linear', C=1, probability=True, random_state=42)
clf.fit(X_train, y_train)
# Predict probabilities on test set
y_prob = clf.predict_proba(X_test)[:, 1]
# Calculate true relevance scores (for demonstration purposes, use y_test directly)
true_relevance = np.asarray([y_test])
# Calculate predicted relevance scores
predicted_relevance = np.asarray([y_prob])
# Calculate DCG score
dcg = dcg_score(true_relevance, predicted_relevance)
print(f"DCG Score: {dcg:.2f}")
Running the example gives an output like:
DCG Score: 21.75
The steps are as follows:
- Generate a synthetic binary classification dataset using
make_classification()
. - Split the dataset into training and test sets using
train_test_split()
. - Train an
SVC
classifier on the training set, enabling probability predictions withprobability=True
. - Use
predict_proba()
to predict probabilities on the test set and select the probabilities of the positive class. - Use the true labels
y_test
as relevance scores for demonstration purposes. - Compute the DCG score with
dcg_score()
by comparing true relevance scores and predicted probabilities. - Print the DCG score, showing the quality of the predicted rankings.
First, we generate a synthetic binary classification dataset using the make_classification()
function from scikit-learn. This function creates a dataset with 1000 samples and 2 classes, simulating a classification problem without real-world data.
Next, we split the dataset into training and test sets using the train_test_split()
function. This step is crucial for evaluating the performance of our classifier on unseen data. We use 80% of the data for training and reserve 20% for testing.
With our data prepared, we train an SVM classifier using the SVC
class from scikit-learn. We specify a linear kernel and enable probability predictions by setting the probability
parameter to True
. The fit()
method is called on the classifier object, passing in the training features (X_train
) and labels (y_train
) to learn the underlying patterns in the data.
After training, we use the trained classifier to predict probabilities on the test set by calling the predict_proba()
method with X_test
. We select the probabilities of the positive class for evaluating our ranking.
For demonstration purposes, we use the true labels y_test
as relevance scores, simulating a real-world scenario where we have relevance judgments for each item.
Finally, we evaluate the quality of our predicted rankings using the dcg_score()
function. This function takes the true relevance scores (true_relevance
) and the predicted relevance scores (predicted_relevance
) as input and calculates the DCG score. The resulting DCG score is printed, providing a quantitative measure of the quality of our predicted rankings.
This example demonstrates how to use the dcg_score()
function from scikit-learn to evaluate the quality of rankings produced by a classification model.