TunedThresholdClassifierCV is a model selection utility that optimizes classification thresholds using cross-validation. This example demonstrates how to use it to improve the performance of binary classification models.
The algorithm helps find the optimal threshold for binary classifiers.
Key hyperparameters include estimator, the base classifier; cv, the cross-validation strategy; and scoring, the metric used for evaluation.
This method is suitable for binary classification problems where optimizing the decision threshold can significantly impact performance.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TunedThresholdClassifierCV
from sklearn.metrics import accuracy_score, roc_auc_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create base model
base_model = LogisticRegression()
# create model
model = TunedThresholdClassifierCV(base_model, cv=5, scoring='roc_auc')
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
roc_auc = roc_auc_score(y_test, yhat)
print('ROC AUC: %.3f' % roc_auc)
# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
ROC AUC: 0.955
Predicted: 0
The steps are as follows:
Generate a synthetic binary classification dataset using
make_classification()with specified parameters. This function creates a dataset with a specified number of samples and features, and a fixed random seed for reproducibility. Split the dataset into training and test sets usingtrain_test_split().Instantiate a base
LogisticRegressionmodel with default hyperparameters.Create
TunedThresholdClassifierCVwith the base model, using 5-fold cross-validation androc_aucas the scoring metric.Fit the
TunedThresholdClassifierCVmodel on the training data using thefit()method.Evaluate the model performance by comparing the predictions (
yhat) to the actual values (y_test) using the ROC AUC score metric.Make a single prediction by passing a new data sample to the
predict()method.
This example shows how to use TunedThresholdClassifierCV to optimize the decision threshold for a binary classification model, improving its overall performance. The model can then be used to make predictions on new data, enabling its use in real-world binary classification problems.