Scikit-Learn Get GridSearchCV "scorer_" Attribute

The scorer_ attribute in scikit-learn’s GridSearchCV class allows you to specify custom scoring metrics for evaluating models during the grid search process. By default, GridSearchCV uses the estimator’s default scorer, which varies depending on the type of estimator. However, you can override this behavior by setting the scorer_ attribute to a custom scoring function that aligns with your specific problem and model objectives.

Choosing an appropriate scoring metric is crucial for selecting the best model and hyperparameters. The scorer_ attribute provides flexibility in defining custom scorers that capture the desired performance characteristics of your model. For example, you might want to optimize for metrics like mean absolute error, precision, recall, or any other metric that is relevant to your problem domain.

from sklearn.datasets import make_regression
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_absolute_error

# Generate a synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Create a Ridge regression estimator
ridge = Ridge(random_state=42)

# Define the parameter grid
param_grid = {
    'alpha': [0.1, 1.0, 10.0],
    'solver': ['auto', 'svd', 'cholesky']
}

# Define a custom scoring function
def custom_scorer(estimator, X, y):
    y_pred = estimator.predict(X)
    mae = mean_absolute_error(y, y_pred)
    return -mae  # Negating MAE since sklearn maximizes the score

# Create a GridSearchCV object with the custom scorer
grid_search = GridSearchCV(estimator=ridge, param_grid=param_grid, scoring=custom_scorer, cv=5)

# Fit the GridSearchCV object
grid_search.fit(X, y)

# Print the best parameters and best score
print("Best parameters: ", grid_search.best_params_)
print("Best score: ", -grid_search.best_score_)

Running the example gives an output like:

Best parameters:  {'alpha': 0.1, 'solver': 'svd'}
Best score:  0.0800238150391277

The key steps in this example are:

Preparing a synthetic regression dataset using make_regression for grid search.
Defining a Ridge regression estimator and the parameter grid with hyperparameters to tune.
Creating a custom scoring function custom_scorer that calculates the negated mean absolute error (MAE). The score is negated because sklearn maximizes the score.
Configuring the GridSearchCV object with the Ridge estimator, parameter grid, custom scorer, and cross-validation strategy.
Fitting the GridSearchCV object on the synthetic dataset.
Accessing the best parameters and best score based on the custom scorer using best_params_ and best_score_ attributes, respectively. The best_score_ is negated to obtain the actual MAE value.

By setting the scorer_ attribute to a custom scoring function, you can tailor the model selection process to optimize for the metric that matters most for your specific problem. This flexibility allows you to align the grid search with your model’s objectives and make more informed decisions when selecting the best hyperparameters.

See Also