SKLearner Home | About | Contact | Examples

Scikit-Learn Configure GridSearchCV "cv" Parameter

The ‘cv’ parameter in scikit-learn’s GridSearchCV controls the cross-validation splitting strategy used during hyperparameter tuning. Setting ‘cv’ appropriately ensures that the model’s performance is evaluated reliably.

Grid search is a method for exhaustively searching over a specified set of parameter values to find the best combination. It trains and evaluates the model for each combination of parameters, using the specified cross-validation strategy to assess performance.

The ‘cv’ parameter can be set to an integer to specify the number of folds, or to a cross-validation object like KFold or StratifiedKFold.

Choosing the right cross-validation strategy is crucial for ensuring that the hyperparameter tuning process is robust and reliable, especially for imbalanced datasets where stratification is important.

from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, KFold, StratifiedKFold

# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# define the parameter and grid values
param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01]}

# define and perform a grid search with cv=5
grid_cv5 = GridSearchCV(estimator=SVC(), param_grid=param_grid, cv=5)
grid_cv5.fit(X, y)

# define and perform a grid search with KFold cv
grid_kfold = GridSearchCV(estimator=SVC(), param_grid=param_grid, cv=KFold(n_splits=5))
grid_kfold.fit(X, y)

# define and perform a grid search with StratifiedKFold cv
grid_stratified = GridSearchCV(estimator=SVC(), param_grid=param_grid, cv=StratifiedKFold(n_splits=5))
grid_stratified.fit(X, y)

# report the best parameters for cv=5
print("Best parameters found with cv=5:")
print(grid_cv5.best_params_)

# report the best parameters for KFold cv
print("Best parameters found with KFold cv:")
print(grid_kfold.best_params_)

# report the best parameters for StratifiedKFold cv
print("Best parameters found with StratifiedKFold cv:")
print(grid_stratified.best_params_)

Running the example gives an output like:

Best parameters found with cv=5:
{'C': 1, 'gamma': 0.01}
Best parameters found with KFold cv:
{'C': 10, 'gamma': 0.01}
Best parameters found with StratifiedKFold cv:
{'C': 1, 'gamma': 0.01}

The key steps in this example are:

  1. Generate a synthetic binary classification dataset using make_classification.
  2. Define a parameter grid for SVC with C and gamma values to search over.
  3. Create three GridSearchCV objects with different cv settings:
    • Integer cv=5 for 5-fold cross-validation.
    • cv=KFold(n_splits=5) for 5-fold cross-validation.
    • cv=StratifiedKFold(n_splits=5) for 5-fold stratified cross-validation.
  4. Fit each grid search object to find the best parameters for each cross-validation strategy.
  5. Print out the best parameters found by each grid search to demonstrate how different cross-validation strategies can affect the hyperparameter tuning results.

This example highlights the importance of selecting an appropriate cross-validation strategy to ensure reliable evaluation during hyperparameter tuning with GridSearchCV.



See Also