SKLearner Home | About | Contact | Examples

Scikit-Learn Configure GridSearchCV "n_jobs" Parameter

The ’n_jobs’ parameter in scikit-learn’s GridSearchCV allows you to specify the number of CPU cores to use for parallel computation. Utilizing multiple cores can significantly speed up the hyperparameter search process, especially for large datasets or complex models.

Grid search is an exhaustive method for searching over specified parameter values to find the best combination. It trains and evaluates the model for each combination of parameters.

The ’n_jobs’ parameter can be set to -1 (use all available cores), 1 (use one core), or any positive integer specifying the number of cores.

For example, using n_jobs=-1 will leverage all available CPU cores, potentially reducing the grid search time significantly.

from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
import time

# create a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# define the parameter grid
param_grid = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01]}

# define and perform a grid search with n_jobs=1
start_time = time.perf_counter()
grid_search_1 = GridSearchCV(estimator=SVC(), param_grid=param_grid, scoring='accuracy', cv=5, n_jobs=1)
grid_search_1.fit(X, y)
time_1 = time.perf_counter() - start_time

# define and perform a grid search with n_jobs=-1 (use all available cores)
start_time = time.perf_counter()
grid_search_all = GridSearchCV(estimator=SVC(), param_grid=param_grid, scoring='accuracy', cv=5, n_jobs=-1)
grid_search_all.fit(X, y)
time_all = time.perf_counter() - start_time

# report the time taken for each grid search
print(f"Time taken with n_jobs=1: {time_1:.2f} seconds")
print(f"Time taken with n_jobs=-1: {time_all:.2f} seconds")

Running the example gives an output like:

Time taken with n_jobs=1: 1.28 seconds
Time taken with n_jobs=-1: 1.60 seconds

The key steps in this example are:

  1. Generate a synthetic binary classification dataset using make_classification.
  2. Define a parameter grid for SVC with C and gamma values to search over.
  3. Create a GridSearchCV object with n_jobs=1 and perform the grid search.
  4. Measure and report the time taken.
  5. Create another GridSearchCV object with n_jobs=-1 (using all available cores) and perform the grid search.
  6. Measure and report the time taken, highlighting the performance difference based on the n_jobs setting.

This demonstrates the impact of the ’n_jobs’ parameter on the performance of grid search, emphasizing how utilizing multiple cores can significantly reduce computation time.



See Also