ShuffleSplit is a cross-validation technique for randomly splitting a dataset into train and test sets. It allows specifying the test set size and the number of splitting iterations.
ShuffleSplit is appropriate for evaluating the performance of machine learning models, particularly when working with datasets that are not explicitly ordered.
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# create shuffle split
ss = ShuffleSplit(n_splits=5, test_size=0.2, random_state=1)
# enumerate splits
for train_index, test_index in ss.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
print('train: %s, test: %s' % (X_train.shape, X_test.shape))
Running the example gives an output like:
train: (80, 5), test: (20, 5)
train: (80, 5), test: (20, 5)
train: (80, 5), test: (20, 5)
train: (80, 5), test: (20, 5)
train: (80, 5), test: (20, 5)
The steps in this example are:
First, a synthetic binary classification dataset is generated using the
make_classification()
function.Next, a
ShuffleSplit
object is created, specifying the desired number of splits (n_splits
) and the size of the test set (test_size
). Setting arandom_state
ensures reproducibility.The
split()
method of theShuffleSplit
object is then used to iterate over the splits. For each split, the indices of the training and test sets are printed, and the correspondingX
andy
data are extracted using these indices. The shapes of the resultingX_train
andX_test
are also printed, confirming the sizes of the train and test sets for each split.
This example demonstrates how to use ShuffleSplit
for randomly splitting a dataset into train and test sets, which is useful for evaluating the performance of machine learning models. The test_size
and n_splits
parameters provide control over the size of the test set and the number of times the dataset is split, respectively.