SKLearner Home | About | Contact | Examples

Scikit-Learn cross_val_score() to Evaluate Models

Evaluating the performance of machine learning models is crucial to understand their effectiveness. The cross_val_score() function in scikit-learn simplifies this by automating cross-validation, providing a robust estimate of model performance.

The key hyperparameters of cross_val_score include estimator (the model to be evaluated), cv (number of cross-validation folds), and scoring (metric to evaluate model performance).

The function is appropriate for both classification and regression problems.

from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression

# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)

# create model
model = LogisticRegression()

# evaluate model using cross-validation
scores = cross_val_score(model, X, y, scoring='accuracy', cv=5)

# print the accuracy scores
print('Accuracy scores:', scores)
print('Mean Accuracy: %.3f' % scores.mean())

Running the example gives an output like:

Accuracy scores: [0.95 1.   0.9  0.95 0.95]
Mean Accuracy: 0.950

The steps are as follows:

  1. First, generate a synthetic binary classification dataset using make_classification(). This creates a dataset with a specified number of samples (n_samples), features (n_features), and classes (n_classes), with a fixed random seed (random_state) for reproducibility.

  2. Create a LogisticRegression model instance using default hyperparameters.

  3. Use cross_val_score to evaluate the model using 5-fold cross-validation. The function splits the dataset into five parts, trains the model on four parts, and tests it on the remaining part. This process repeats five times, each time with a different part as the test set.

  4. Print the accuracy scores for each fold and the mean accuracy across all folds to assess the model’s performance.

This example shows how to use cross_val_score to automate cross-validation, providing a reliable estimate of model performance. The function splits the dataset, trains the model on each subset, and evaluates it, helping to understand how well the model generalizes to unseen data.



See Also