The scorer_
attribute in scikit-learn’s GridSearchCV
class allows you to specify custom scoring metrics for evaluating models during the grid search process. By default, GridSearchCV
uses the estimator’s default scorer, which varies depending on the type of estimator. However, you can override this behavior by setting the scorer_
attribute to a custom scoring function that aligns with your specific problem and model objectives.
Choosing an appropriate scoring metric is crucial for selecting the best model and hyperparameters. The scorer_
attribute provides flexibility in defining custom scorers that capture the desired performance characteristics of your model. For example, you might want to optimize for metrics like mean absolute error, precision, recall, or any other metric that is relevant to your problem domain.
from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error
# Generate a synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Create a Ridge regression estimator
ridge = Ridge(random_state=42)
# Define the parameter grid
param_grid = {
'alpha': [0.1, 1.0, 10.0],
'solver': ['auto', 'svd', 'cholesky']
}
# Define a custom scoring function
def custom_scorer(estimator, X, y):
y_pred = estimator.predict(X)
mae = mean_absolute_error(y, y_pred)
return -mae # Negating MAE since sklearn maximizes the score
# Create a GridSearchCV object with the custom scorer
grid_search = GridSearchCV(estimator=ridge, param_grid=param_grid, scoring=custom_scorer, cv=5)
# Fit the GridSearchCV object
grid_search.fit(X, y)
# Print the best parameters and best score
print("Best parameters: ", grid_search.best_params_)
print("Best score: ", -grid_search.best_score_)
Running the example gives an output like:
Best parameters: {'alpha': 0.1, 'solver': 'svd'}
Best score: 0.0800238150391277
The key steps in this example are:
- Preparing a synthetic regression dataset using
make_regression
for grid search. - Defining a
Ridge
regression estimator and the parameter grid with hyperparameters to tune. - Creating a custom scoring function
custom_scorer
that calculates the negated mean absolute error (MAE). The score is negated because sklearn maximizes the score. - Configuring the
GridSearchCV
object with theRidge
estimator, parameter grid, custom scorer, and cross-validation strategy. - Fitting the
GridSearchCV
object on the synthetic dataset. - Accessing the best parameters and best score based on the custom scorer using
best_params_
andbest_score_
attributes, respectively. Thebest_score_
is negated to obtain the actual MAE value.
By setting the scorer_
attribute to a custom scoring function, you can tailor the model selection process to optimize for the metric that matters most for your specific problem. This flexibility allows you to align the grid search with your model’s objectives and make more informed decisions when selecting the best hyperparameters.