The param_grid
parameter in scikit-learn’s GridSearchCV
is used to specify the hyperparameters and their corresponding values to search over during the grid search process. Defining an appropriate param_grid
is crucial for finding the best combination of hyperparameters for a given model.
Grid search is a method for exhaustively searching over a specified set of parameter values to find the best combination. It trains and evaluates the model for each combination of parameters, using a specified scoring metric to compare performance.
The param_grid
parameter takes a dictionary where the keys are the names of the hyperparameters and the values are lists of values to try for each hyperparameter. GridSearchCV
will then search over all possible combinations of these values.
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# define the parameter grid
param_grid = {
'C': [0.1, 1, 10],
'kernel': ['linear', 'rbf'],
'gamma': ['scale', 'auto']
}
# create a GridSearchCV object
grid_search = GridSearchCV(estimator=SVC(), param_grid=param_grid, cv=5)
# perform the grid search
grid_search.fit(X, y)
# report the best parameters
print("Best parameters found:")
print(grid_search.best_params_)
Best parameters found:
{'C': 10, 'gamma': 'scale', 'kernel': 'linear'}
The key steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification
- Define the
param_grid
dictionary with the hyperparameters and their values to search over - Create a
GridSearchCV
object with the estimator (SVC
),param_grid
, and the number of cross-validation splits (cv
) - Fit the
GridSearchCV
object to perform the grid search - Print out the best parameters found by the grid search
This demonstrates how to set up the param_grid
parameter in GridSearchCV
to search over multiple hyperparameters and their corresponding values, allowing you to find the best combination for your specific model and dataset.