The param_grid parameter in scikit-learn’s GridSearchCV is used to specify the hyperparameters and their corresponding values to search over during the grid search process. Defining an appropriate param_grid is crucial for finding the best combination of hyperparameters for a given model.
Grid search is a method for exhaustively searching over a specified set of parameter values to find the best combination. It trains and evaluates the model for each combination of parameters, using a specified scoring metric to compare performance.
The param_grid parameter takes a dictionary where the keys are the names of the hyperparameters and the values are lists of values to try for each hyperparameter. GridSearchCV will then search over all possible combinations of these values.
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# define the parameter grid
param_grid = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf'],
'gamma': ['scale', 'auto']
}
# create a GridSearchCV object
grid_search = GridSearchCV(estimator=SVC(), param_grid=param_grid, cv=5)
# perform the grid search
grid_search.fit(X, y)
# report the best parameters
print("Best parameters found:")
print(grid_search.best_params_)
Best parameters found:
{'C': 10, 'gamma': 'scale', 'kernel': 'linear'}
The key steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification - Define the
param_griddictionary with the hyperparameters and their values to search over - Create a
GridSearchCVobject with the estimator (SVC),param_grid, and the number of cross-validation splits (cv) - Fit the
GridSearchCVobject to perform the grid search - Print out the best parameters found by the grid search
This demonstrates how to set up the param_grid parameter in GridSearchCV to search over multiple hyperparameters and their corresponding values, allowing you to find the best combination for your specific model and dataset.