ndcg_score()
is a metric used to evaluate the ranking quality of classification models. ndcg
stands for Normalized Discounted Cumulative Gain.
It measures how well the predicted rankings of samples match the actual rankings. This metric is particularly useful in problems where the order of predictions matters, such as information retrieval or recommendation systems.
The ndcg_score()
function in scikit-learn calculates the ndcg
by comparing the true and predicted rankings. It takes the true labels and predicted probabilities as input and returns a float value between 0 and 1, with 1 indicating perfect ranking.
While ndcg_score()
is powerful for ranking problems, it may not be suitable for standard classification tasks where the ranking of predictions is not relevant.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import ndcg_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# Transform y into a probability-like ranking for the example
y = np.vstack((y, 1 - y)).T
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train an SVM classifier
clf = SVC(kernel='linear', C=1, probability=True, random_state=42)
clf.fit(X_train, y_train[:, 0])
# Predict probabilities on test set
y_pred_prob = clf.predict_proba(X_test)
# Calculate NDCG score
ndcg = ndcg_score(y_test, y_pred_prob)
print(f"NDCG Score: {ndcg:.2f}")
Running the example gives an output like:
NDCG Score: 0.68
The steps are as follows:
- Generate a synthetic binary classification dataset using
make_classification()
. - Transform the target variable
y
into a probability-like ranking format required forndcg_score()
. - Split the dataset into training and test sets using
train_test_split()
. - Train an
SVC
classifier with a linear kernel and probability estimation enabled. - Predict probabilities for the test set using the trained classifier.
- Calculate the
ndcg_score
by comparing the true rankings with the predicted probabilities. - Print the
ndcg_score
to evaluate the ranking performance of the model.