Scikit-Learn Configure RandomizedSearchCV "n_iter" Parameter

The n_iter parameter in RandomizedSearchCV controls the number of parameter settings that are sampled during the hyperparameter search process. Random search tries random combinations of hyperparameters to find the best performing model.

The default value for n_iter is 10, which means 10 different parameter settings will be sampled and evaluated.

Higher values of n_iter lead to more extensive searches and potentially better performance, but also result in longer runtimes.

As a heuristic, start with the default value of 10 and increase it until the performance plateaus or the computational budget is reached.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
import time

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=42)

# Define a parameter distribution for RandomForestClassifier hyperparameters
param_dist = {'n_estimators': [10, 50, 100],
              'max_depth': [None, 5, 10],
              'min_samples_split': [2, 5, 10]}

# Create a base RandomForestClassifier model
rf = RandomForestClassifier(random_state=42)

# List of n_iter values to test
n_iter_values = [10, 50, 100]

for n_iter in n_iter_values:
    start_time = time.perf_counter()

    # Run RandomizedSearchCV with the current n_iter value
    search = RandomizedSearchCV(rf, param_dist, n_iter=n_iter, cv=5, n_jobs=-1, random_state=42)
    search.fit(X, y)

    end_time = time.perf_counter()
    execution_time = end_time - start_time

    print(f"Best score for n_iter={n_iter}: {search.best_score_:.3f}")
    print(f"Execution time for n_iter={n_iter}: {execution_time:.2f} seconds")
    print()

Running the example gives an output like:

Best score for n_iter=10: 0.933
Execution time for n_iter=10: 2.54 seconds

Best score for n_iter=50: 0.939
Execution time for n_iter=50: 3.47 seconds

Best score for n_iter=100: 0.939
Execution time for n_iter=100: 3.79 seconds

The steps in this example are:

Generate a synthetic binary classification dataset using make_classification().
Define a parameter distribution dictionary param_dist for RandomForestClassifier hyperparameters.
Create a base RandomForestClassifier model rf.
Iterate over different n_iter values.
For each n_iter value:
- Record the start time using time.perf_counter().
- Run RandomizedSearchCV with the current n_iter value and 5-fold cross-validation.
- Record the end time and calculate the execution time.
- Print the best score and execution time for the current n_iter value.

See Also