The pre_dispatch
parameter in scikit-learn’s GridSearchCV
helps control the number of jobs that get dispatched during parallel execution. Adjusting this parameter can manage memory usage and improve efficiency during hyperparameter optimization.
Grid search systematically works through multiple combinations of parameter values, cross-validating as it goes to determine which combination gives the best performance.
The pre_dispatch
parameter controls the number of jobs to be pre-dispatched, which can reduce memory usage by limiting the number of parallel tasks.
The default value is 2*n_jobs
. Common values include a fixed integer like 2, 4, etc., or a fraction of n_jobs
. Setting it to a lower number can help reduce memory usage when working with large datasets or models.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
# define the parameter and grid values
param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]}
# define and perform a grid search with different pre_dispatch values
grid_default = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5)
grid_default.fit(X, y)
grid_pre_dispatch_2 = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5, pre_dispatch=2)
grid_pre_dispatch_2.fit(X, y)
grid_pre_dispatch_4 = GridSearchCV(estimator=RandomForestClassifier(), param_grid=param_grid, cv=5, pre_dispatch=4)
grid_pre_dispatch_4.fit(X, y)
# report the best parameters
print("Best parameters with default pre_dispatch:")
print(grid_default.best_params_)
print("Best parameters with pre_dispatch=2:")
print(grid_pre_dispatch_2.best_params_)
print("Best parameters with pre_dispatch=4:")
print(grid_pre_dispatch_4.best_params_)
Running the example gives an output like:
Best parameters with default pre_dispatch:
{'max_depth': 10, 'n_estimators': 100}
Best parameters with pre_dispatch=2:
{'max_depth': None, 'n_estimators': 50}
Best parameters with pre_dispatch=4:
{'max_depth': None, 'n_estimators': 200}
The key steps in this example are:
- Generate a synthetic dataset using
make_classification
. - Define a parameter grid for
RandomForestClassifier
withn_estimators
andmax_depth
values. - Create three
GridSearchCV
objects with differentpre_dispatch
values: default, 2, and 4. - Fit each grid search object to find the best parameters for each
pre_dispatch
setting. - Print out the best parameters found by each grid search, highlighting how the optimal parameters can differ based on the
pre_dispatch
value.
This demonstrates how adjusting the pre_dispatch
parameter in GridSearchCV
can impact memory usage and efficiency during hyperparameter tuning.