The estimator
parameter in RandomizedSearchCV
specifies the base model to use for hyperparameter tuning. Random search is a hyperparameter optimization method that evaluates random combinations of parameters to find the best performing model.
The estimator
must be a scikit-learn estimator object, such as a classifier or regressor, depending on the problem type.
For example, when working with a classification problem, you would use a classifier estimator like RandomForestClassifier
or SVC
.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
# Define a parameter distribution for RandomForestClassifier hyperparameters
param_dist = {'n_estimators': [100, 200, 300],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5, 10]}
# Create a base RandomForestClassifier estimator
rf = RandomForestClassifier(random_state=42)
# Run RandomizedSearchCV with the base estimator
search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
search.fit(X, y)
print(f"Best score: {search.best_score_:.3f}")
print(f"Best parameters: {search.best_params_}")
Running the example gives an output like:
Best score: 0.907
Best parameters: {'n_estimators': 300, 'min_samples_split': 10, 'max_depth': None}
The steps in this example are:
- Generate a synthetic binary classification dataset using
make_classification()
. - Define a parameter distribution dictionary
param_dist
forRandomForestClassifier
hyperparameters. - Create a base
RandomForestClassifier
estimatorrf
. - Run
RandomizedSearchCV
with the base estimator, parameter distributions, 10 iterations, and 5-fold cross-validation. - Print the best score and best parameters found by
RandomizedSearchCV
.