SKLearner Home | About | Contact | Examples

Scikit-Learn LeaveOneOut Data Splitting

LeaveOneOut (LOO) cross-validation is a method used to evaluate a model’s performance by training on all data points except one and testing on the excluded point. This process is repeated for each data point, ensuring each point is used for testing once.

The LeaveOneOut cross-validation technique is a special case of K-Fold cross-validation where K equals the number of samples. It does not have hyperparameters, but it is computationally expensive for large datasets.

LeaveOneOut is appropriate for any supervised learning problem where the dataset is small.

from sklearn.datasets import make_classification
from sklearn.model_selection import LeaveOneOut, cross_val_score
from sklearn.linear_model import LogisticRegression
import numpy as np

# generate a synthetic binary classification dataset
X, y = make_classification(n_samples=10, n_features=5, n_classes=2, random_state=1)

# configure the cross-validation procedure
loo = LeaveOneOut()

# create the model
model = LogisticRegression()

# evaluate the model using LOO cross-validation
scores = cross_val_score(model, X, y, cv=loo)

# summarize the results
print('Accuracy: %.3f (%.3f)' % (np.mean(scores), np.std(scores)))

Running the example gives an output like:

Accuracy: 1.000 (0.000)

The steps are as follows:

  1. First, a small synthetic binary classification dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples), classes (n_classes), and a fixed random seed (random_state) for reproducibility.

  2. Next, the LeaveOneOut cross-validation procedure is configured using the LeaveOneOut() class.

  3. A LogisticRegression model is instantiated with default hyperparameters.

  4. The performance of the model is evaluated by using the cross_val_score() function with the LeaveOneOut cross-validation object (cv=loo). The function returns a list of accuracy scores for each test set.

  5. Finally, the accuracy results are summarized by calculating the mean and standard deviation of the scores, providing a measure of the model’s performance across all iterations.

This example demonstrates how to use the LeaveOneOut cross-validation technique to evaluate the performance of a LogisticRegression model on a small synthetic dataset, highlighting the simplicity and effectiveness of this method in scikit-learn.



See Also