Scikit-Learn Configure RandomizedSearchCV "estimator" Parameter

The estimator parameter in RandomizedSearchCV specifies the base model to use for hyperparameter tuning. Random search is a hyperparameter optimization method that evaluates random combinations of parameters to find the best performing model.

The estimator must be a scikit-learn estimator object, such as a classifier or regressor, depending on the problem type.

For example, when working with a classification problem, you would use a classifier estimator like RandomForestClassifier or SVC.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)

# Define a parameter distribution for RandomForestClassifier hyperparameters
param_dist = {'n_estimators': [100, 200, 300],
              'max_depth': [None, 5, 10],
              'min_samples_split': [2, 5, 10]}

# Create a base RandomForestClassifier estimator
rf = RandomForestClassifier(random_state=42)

# Run RandomizedSearchCV with the base estimator
search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
search.fit(X, y)

print(f"Best score: {search.best_score_:.3f}")
print(f"Best parameters: {search.best_params_}")

Running the example gives an output like:

Best score: 0.907
Best parameters: {'n_estimators': 300, 'min_samples_split': 10, 'max_depth': None}

The steps in this example are:

Generate a synthetic binary classification dataset using make_classification().
Define a parameter distribution dictionary param_dist for RandomForestClassifier hyperparameters.
Create a base RandomForestClassifier estimator rf.
Run RandomizedSearchCV with the base estimator, parameter distributions, 10 iterations, and 5-fold cross-validation.
Print the best score and best parameters found by RandomizedSearchCV.

See Also