Scikit-Learn Get RandomizedSearchCV "scorer_" Attribute

RandomizedSearchCV is a powerful tool for hyperparameter optimization that allows you to sample from distributions of hyperparameter values.

When running a random search, you can specify a custom scoring metric for evaluating the performance of different hyperparameter configurations using the scoring parameter.

The scorer_ attribute of a fitted RandomizedSearchCV object stores the actual scorer that was used during the search.

Accessing scorer_ is useful for understanding how the models were evaluated and can be helpful if you want to use the same scorer for evaluating the final selected model.

from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_absolute_error
from scipy.stats import randint

# Generate a random regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

# Define a custom scorer that calculates mean absolute error (MAE)
def custom_scorer(estimator, X, y):
    y_pred = estimator.predict(X)
    mae = mean_absolute_error(y, y_pred)
    return -mae  # Negate MAE since sklearn optimizes for higher scores

# Set up a RandomForestRegressor
rf = RandomForestRegressor(random_state=42)

# Define hyperparameter distributions to sample from
param_dist = {
    'n_estimators': randint(5, 50),
    'max_depth': [3, 5, 10, None],
    'min_samples_split': randint(2, 10),
}

# Run random search with custom scorer and 5-fold cross-validation
random_search = RandomizedSearchCV(rf, param_distributions=param_dist,
                                   scoring=custom_scorer, n_iter=10, cv=5, random_state=42)
random_search.fit(X, y)

# Access scorer_ attribute
scorer = random_search.scorer_

# Print scorer
print("Scorer used in RandomizedSearchCV:")
print(scorer)

Running the example gives an output like:

Scorer used in RandomizedSearchCV:
<function custom_scorer at 0x107317060>

The steps are as follows:

Prepare a synthetic regression dataset using make_regression.
Define a custom scoring function custom_scorer that calculates the negated mean absolute error (MAE).
Configure a RandomForestRegressor and define distributions to sample hyperparameters from.
Run RandomizedSearchCV with the regressor, hyperparameter distributions, custom scorer, 10 iterations, and 5-fold cross-validation.
After fitting, access the scorer_ attribute from the random_search object and print its value.

By specifying a custom scorer in RandomizedSearchCV, you can evaluate models based on a metric that is most relevant to your problem. The scorer_ attribute allows you to access the actual scorer used during the search, which can be useful for consistency in model evaluation.

See Also