Scikit-Learn Configure GridSearchCV "pre_dispatch" Parameter

The pre_dispatch parameter in scikit-learn’s GridSearchCV helps control the number of jobs that get dispatched during parallel execution. Adjusting this parameter can manage memory usage and improve efficiency during hyperparameter optimization.

Grid search systematically works through multiple combinations of parameter values, cross-validating as it goes to determine which combination gives the best performance.

The pre_dispatch parameter controls the number of jobs to be pre-dispatched, which can reduce memory usage by limiting the number of parallel tasks.

The default value is 2*n_jobs. Common values include a fixed integer like 2, 4, etc., or a fraction of n_jobs. Setting it to a lower number can help reduce memory usage when working with large datasets or models.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)

# define the parameter and grid values
param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]}

# define and perform a grid search with different pre_dispatch values
grid_default = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5)
grid_default.fit(X, y)

grid_pre_dispatch_2 = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5, pre_dispatch=2)
grid_pre_dispatch_2.fit(X, y)

grid_pre_dispatch_4 = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5, pre_dispatch=4)
grid_pre_dispatch_4.fit(X, y)

# report the best parameters
print("Best parameters with default pre_dispatch:")
print(grid_default.best_params_)

print("Best parameters with pre_dispatch=2:")
print(grid_pre_dispatch_2.best_params_)

print("Best parameters with pre_dispatch=4:")
print(grid_pre_dispatch_4.best_params_)

Running the example gives an output like:

Best parameters with default pre_dispatch:
{'max_depth': 10, 'n_estimators': 100}
Best parameters with pre_dispatch=2:
{'max_depth': None, 'n_estimators': 50}
Best parameters with pre_dispatch=4:
{'max_depth': None, 'n_estimators': 200}

The key steps in this example are:

Generate a synthetic dataset using make_classification.
Define a parameter grid for RandomForestClassifier with n_estimators and max_depth values.
Create three GridSearchCV objects with different pre_dispatch values: default, 2, and 4.
Fit each grid search object to find the best parameters for each pre_dispatch setting.
Print out the best parameters found by each grid search, highlighting how the optimal parameters can differ based on the pre_dispatch value.

This demonstrates how adjusting the pre_dispatch parameter in GridSearchCV can impact memory usage and efficiency during hyperparameter tuning.

See Also