Scikit-Learn Get RandomizedSearchCV "refit_time_" Attribute

The refit_time_ attribute of a RandomizedSearchCV object stores the time, in seconds, it took to refit the best model on the full dataset after the random search.

When refit=True (default), RandomizedSearchCV will clone the best estimator found during the search and refit it on the whole dataset. This final model is accessible via the best_estimator_ attribute.

Accessing refit_time_ is useful for evaluating the computational cost of refitting the best model, which can be a significant factor in the overall time complexity of the hyperparameter search process.

from sklearn.datasets import make_regression
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Generate a random regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

# Set up a RandomForestRegressor
rf = RandomForestRegressor(random_state=42)

# Define hyperparameter distributions to sample from
param_dist = {
    'n_estimators': randint(5, 10),
    'max_depth': [3, 5, 10, None],
    'min_samples_split': randint(2, 10),
}

# Run random search with 5-fold cross-validation and refit the best model
random_search = RandomizedSearchCV(rf, param_distributions=param_dist, n_iter=10, cv=5, refit=True, random_state=42)
random_search.fit(X, y)

# Access refit_time_ attribute
refit_time = random_search.refit_time_

# Print refit time
print(f"Time to refit the best model: {refit_time:.2f} seconds")

Running the example gives an output like:

Time to refit the best model: 0.01 seconds

Prepare a synthetic regression dataset using make_regression.
Configure a RandomForestRegressor and define distributions to sample hyperparameters from.
Run RandomizedSearchCV with the regressor, hyperparameter distributions, 10 iterations, 5-fold cross-validation, and refit=True.
After fitting, access the refit_time_ attribute from the random_search object.
Print the refit time to see how long it took to refit the best model on the full dataset.

By monitoring the refit_time_, you can assess the computational overhead of refitting the best model found during the random search, which can help in optimizing your hyperparameter tuning workflow.

See Also