Scikit-Learn TunedThresholdClassifierCV Model

TunedThresholdClassifierCV is a model selection utility that optimizes classification thresholds using cross-validation. This example demonstrates how to use it to improve the performance of binary classification models.

The algorithm helps find the optimal threshold for binary classifiers.

Key hyperparameters include estimator, the base classifier; cv, the cross-validation strategy; and scoring, the metric used for evaluation.

This method is suitable for binary classification problems where optimizing the decision threshold can significantly impact performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TunedThresholdClassifierCV
from sklearn.metrics import accuracy_score, roc_auc_score

# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create base model
base_model = LogisticRegression()

# create model
model = TunedThresholdClassifierCV(base_model, cv=5, scoring='roc_auc')

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
roc_auc = roc_auc_score(y_test, yhat)
print('ROC AUC: %.3f' % roc_auc)

# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

ROC AUC: 0.955
Predicted: 0

The steps are as follows:

Generate a synthetic binary classification dataset using make_classification() with specified parameters. This function creates a dataset with a specified number of samples and features, and a fixed random seed for reproducibility. Split the dataset into training and test sets using train_test_split().
Instantiate a base LogisticRegression model with default hyperparameters.
Create TunedThresholdClassifierCV with the base model, using 5-fold cross-validation and roc_auc as the scoring metric.
Fit the TunedThresholdClassifierCV model on the training data using the fit() method.
Evaluate the model performance by comparing the predictions (yhat) to the actual values (y_test) using the ROC AUC score metric.
Make a single prediction by passing a new data sample to the predict() method.

This example shows how to use TunedThresholdClassifierCV to optimize the decision threshold for a binary classification model, improving its overall performance. The model can then be used to make predictions on new data, enabling its use in real-world binary classification problems.

See Also