TunedThresholdClassifierCV
is a model selection utility that optimizes classification thresholds using cross-validation. This example demonstrates how to use it to improve the performance of binary classification models.
The algorithm helps find the optimal threshold for binary classifiers.
Key hyperparameters include estimator
, the base classifier; cv
, the cross-validation strategy; and scoring
, the metric used for evaluation.
This method is suitable for binary classification problems where optimizing the decision threshold can significantly impact performance.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import TunedThresholdClassifierCV
from sklearn.metrics import accuracy_score, roc_auc_score
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create base model
base_model = LogisticRegression()
# create model
model = TunedThresholdClassifierCV(base_model, cv=5, scoring='roc_auc')
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
roc_auc = roc_auc_score(y_test, yhat)
print('ROC AUC: %.3f' % roc_auc)
# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %d' % yhat[0])
Running the example gives an output like:
ROC AUC: 0.955
Predicted: 0
The steps are as follows:
Generate a synthetic binary classification dataset using
make_classification()
with specified parameters. This function creates a dataset with a specified number of samples and features, and a fixed random seed for reproducibility. Split the dataset into training and test sets usingtrain_test_split()
.Instantiate a base
LogisticRegression
model with default hyperparameters.Create
TunedThresholdClassifierCV
with the base model, using 5-fold cross-validation androc_auc
as the scoring metric.Fit the
TunedThresholdClassifierCV
model on the training data using thefit()
method.Evaluate the model performance by comparing the predictions (
yhat
) to the actual values (y_test
) using the ROC AUC score metric.Make a single prediction by passing a new data sample to the
predict()
method.
This example shows how to use TunedThresholdClassifierCV
to optimize the decision threshold for a binary classification model, improving its overall performance. The model can then be used to make predictions on new data, enabling its use in real-world binary classification problems.