Scikit-Learn average_precision_score() Metric

The Average Precision (AP) score is a popular metric for evaluating the performance of binary classification models, particularly when dealing with imbalanced datasets.

It summarizes the precision-recall curve as the weighted mean of precisions achieved at each threshold, providing a single value that encapsulates the model’s ability to rank positive instances higher than negative instances.

AP is especially useful when the cost of false positives and false negatives is different, as it focuses on the positive class and does not consider true negatives. However, it’s important to note that AP is not symmetric, meaning that swapping the positive and negative labels will result in a different score.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import average_precision_score

# Generate an imbalanced binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1], random_state=42)

# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a logistic regression classifier
clf = LogisticRegression(random_state=42)
clf.fit(X_train, y_train)

# Predict probabilities on the test set
y_pred_prob = clf.predict_proba(X_test)[:, 1]

# Calculate the average precision score
ap = average_precision_score(y_test, y_pred_prob)
print(f"Average Precision: {ap:.2f}")

Running the example gives an output like:

Average Precision: 0.48

The steps involved in this example are:

Generate an imbalanced binary classification dataset using make_classification(), with 90% of samples belonging to the majority class and 10% to the minority class.
Split the dataset into training and test sets using train_test_split(), with 80% for training and 20% for testing.
Train a logistic regression classifier on the training set using LogisticRegression.
Predict the probabilities of the positive class for the test set using predict_proba().
Calculate the average precision score using average_precision_score() by passing the true labels and predicted probabilities.

In this example, we generate an imbalanced binary classification dataset using make_classification(), with a class distribution of 90% for the majority class and 10% for the minority class. This allows us to simulate a scenario where the positive instances are rare, which is common in many real-world applications.

We then split the dataset into training and test sets using train_test_split() and train a logistic regression classifier on the training set. The classifier learns the patterns in the data and can be used to make predictions on unseen instances.

To calculate the average precision score, we first predict the probabilities of the positive class for the test set using predict_proba(). It’s crucial to use the predicted probabilities rather than the predicted labels, as the average precision score takes into account the ranking of the instances based on their probabilities.

Finally, we compute the average precision score using the average_precision_score() function from scikit-learn, passing the true labels and predicted probabilities as arguments. The resulting score provides an overall measure of the classifier’s performance, with higher values indicating better ranking quality.

This example demonstrates how to use the average_precision_score() function to evaluate a binary classification model’s performance on an imbalanced dataset. By generating synthetic data, training a logistic regression classifier, predicting probabilities, and calculating the average precision score, we can assess the model’s ability to rank positive instances higher than negative instances across different probability thresholds.

See Also