The ‘scoring’ parameter in scikit-learn’s GridSearchCV
allows you to specify the evaluation metric to optimize during hyperparameter tuning. Using an appropriate scoring metric that aligns with your task’s goals is crucial for finding the best model configuration.
Grid search is a method for exhaustively searching over a specified set of parameter values to find the best combination. It trains and evaluates the model for each combination of parameters, using the specified scoring metric to compare performance.
The ‘scoring’ parameter can be set to a string identifier for common metrics like ‘accuracy’, ‘precision’, ‘recall’, or ‘f1’.
For classification and regression tasks, the choice of metric should reflect the priorities of the problem - for example, optimizing for recall may be more important than accuracy in a fraud detection model.
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# define the parameter and grid values
param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01]}
# define and perform a grid search with with the "accuracy" scoring metric
grid_accuracy = GridSearchCV(estimator=SVC(), param_grid=param_grid, scoring='accuracy', cv=5)
grid_accuracy.fit(X, y)
# define and perform a grid search with with the "f1" scoring metric
grid_f1 = GridSearchCV(estimator=SVC(), param_grid=param_grid, scoring='f1', cv=5)
grid_f1.fit(X, y)
# report the best parameters for the "accuracy" scoring metric
print("Best parameters found with accuracy scoring:")
print(grid_accuracy.best_params_)
# report the best parameters for the "f1" scoring metric
print("Best parameters found with f1 scoring:")
print(grid_f1.best_params_)
Running the example gives an output like:
Best parameters found with accuracy scoring:
{'C': 1, 'gamma': 0.01}
Best parameters found with f1 scoring:
{'C': 1, 'gamma': 0.01}
The key steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification
- Define a parameter grid for
SVC
withC
andgamma
values to search over - Create two
GridSearchCV
objects, one with ‘accuracy’ scoring and one with ‘f1’ scoring - Fit both grid search objects to find the best parameters under each scoring metric
- Print out the best parameters found by each grid search, highlighting how the optimal parameters can differ based on the chosen scoring metric
This demonstrates the importance of selecting an appropriate scoring metric that reflects the goals of your machine learning task when performing hyperparameter tuning with GridSearchCV
.