Scikit-Learn Get RandomizedSearchCV "best_estimator_" Attribute

RandomizedSearchCV is a powerful tool for hyperparameter optimization that allows you to sample from distributions of hyperparameter values.

After running a random search, you can access the estimator with the best performance using the best_estimator_ attribute.

The best_estimator_ attribute is the estimator that achieved the highest score during the random search process.

Accessing best_estimator_ is useful for retrieving the fully configured estimator with the optimal hyperparameters found during the search.

You can then use this estimator directly for making predictions on new data or for further analysis and model evaluation.

from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Generate a random regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Set up a RandomForestRegressor
rf = RandomForestRegressor(random_state=42)

# Define hyperparameter distributions to sample from
param_dist = {
    'n_estimators': randint(50, 200),
    'max_depth': [3, 5, 10, None],
    'min_samples_split': randint(2, 10),
}

# Run random search with 5-fold cross-validation
random_search = RandomizedSearchCV(rf, param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
random_search.fit(X, y)

# Access the best estimator
best_rf = random_search.best_estimator_

# Use the best estimator for predictions
new_data = [[1.2, 3.4, 5.6, 7.8, 9.0, 2.1, 4.3, 6.5, 8.7, 0.9]]
prediction = best_rf.predict(new_data)

print("Best hyperparameters:", random_search.best_params_)
print("Prediction:", prediction)

Running the example gives an output like:

Best hyperparameters: {'max_depth': None, 'min_samples_split': 5, 'n_estimators': 87}
Prediction: [326.48972403]

The steps are as follows:

Prepare a synthetic regression dataset using make_regression.
Configure a RandomForestRegressor and define distributions to sample hyperparameters from.
Run RandomizedSearchCV with the regressor, hyperparameter distributions, 10 iterations, and 5-fold cross-validation.
After fitting, access the best_estimator_ attribute from the random_search object.
Use the best_estimator_ for making predictions on new data.

By accessing the best_estimator_ attribute, you can directly obtain the estimator with the best performing hyperparameters found during the random search, allowing you to use it for further tasks without the need for manual configuration.

See Also