Scikit-Learn CalibratedClassifierCV Model

CalibratedClassifierCV is a meta-estimator in scikit-learn that calibrates the predicted probabilities of a base classifier. It is particularly useful when the base estimator does not produce well-calibrated probabilities, such as in the case of Support Vector Machines (SVM).

The key hyperparameters include the base_estimator (the classifier to be calibrated) and the method (the calibration method to use, such as ‘isotonic’ or ‘sigmoid’).

This estimator is appropriate for binary and multi-class classification problems where reliable probability estimates are required.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.calibration import CalibratedClassifierCV
from sklearn.metrics import brier_score_loss

# generate binary classification dataset
X, y = make_classification(n_samples=1000, n_features=5, n_classes=2, weights=[1,1], random_state=42)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# create an uncalibrated base classifier
base_clf = SVC(gamma=2, C=1, probability=True)

# calibrate the base classifier
clf = CalibratedClassifierCV(estimator=base_clf, method='isotonic')

# fit the calibrated classifier
clf.fit(X_train, y_train)

# evaluate the calibrated probabilities
y_prob = clf.predict_proba(X_test)
brier_score = brier_score_loss(y_test, y_prob[:, 1])
print("Brier score: %.4f" % brier_score)

# make a probabilistic prediction on a new sample
new_sample = [[0.5, 1.6, -0.1, 1.4, 0.2]]
prob = clf.predict_proba(new_sample)
print("Probability of class 1: %.4f" % prob[0][1])

The output of this code will be similar to:

Brier score: 0.0560
Probability of class 1: 0.8758

The steps in this example are:

A synthetic binary classification dataset is generated using make_classification(), specifying the number of samples, classes, and class weights. The data is then split into training and test sets.
An SVC base classifier is created with specific hyperparameters. Note that probability=True is required for calibration.
The base classifier is then calibrated using CalibratedClassifierCV, specifying the base estimator and the calibration method (‘isotonic’ in this case).
The calibrated classifier is fit on the training data.
The calibrated probabilities are evaluated on the test set using the Brier score, which measures the accuracy of probabilistic predictions. A lower Brier score indicates better calibration.
Finally, a probabilistic prediction is made on a new data point using the calibrated classifier’s predict_proba() method.

This example demonstrates how to use CalibratedClassifierCV to improve the probability estimates of a base classifier, which can be crucial in applications where well-calibrated probabilities are required for decision making.

See Also