GridSearchCV is a powerful tool in scikit-learn for systematically tuning the hyperparameters of a given model. It performs an exhaustive search over a specified parameter grid, allowing you to find the optimal combination of hyperparameters for your model.
The key hyperparameters of GridSearchCV
include the estimator
(the model to tune), param_grid
(the hyperparameter space to search), scoring
(the metric to optimize), and cv
(the cross-validation splitting strategy).
GridSearchCV is appropriate for tuning the hyperparameters of any supervised learning model, including both classification and regression models.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# generate synthetic classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5, random_state=42)
# split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# define the model to tune
model = DecisionTreeClassifier(random_state=42)
# define the parameter grid
param_grid = {
'max_depth': [3, 5, 7, 9],
'min_samples_split': [2, 5, 10, 20]
}
# perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, scoring='accuracy', cv=5)
grid_search.fit(X_train, y_train)
# report the best score and best parameters
print("Best score: {:.3f}".format(grid_search.best_score_))
print("Best parameters: {}".format(grid_search.best_params_))
# evaluate the best model on the test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test set accuracy: {:.3f}".format(accuracy))
Running the example gives an output like:
Best score: 0.741
Best parameters: {'max_depth': 7, 'min_samples_split': 2}
Test set accuracy: 0.810
The steps are as follows:
First, a synthetic multiclass classification dataset is generated using
make_classification()
. The dataset is split into training and test sets usingtrain_test_split()
.A
DecisionTreeClassifier
is defined as the model to tune. A parameter grid is defined with different values for themax_depth
andmin_samples_split
hyperparameters.GridSearchCV
is instantiated with the model, parameter grid, accuracy scoring metric, and 5-fold cross-validation. It is then fitted on the training data to find the best hyperparameter combination.The best score and best hyperparameters are reported. Finally, the best model is evaluated on the test set to assess its performance on unseen data.
This example demonstrates how to use GridSearchCV
to systematically search for the best hyperparameters of a scikit-learn model. By defining a model and a parameter grid, you can easily tune your model to achieve optimal performance on your specific dataset.