Scikit-Learn Configure RandomizedSearchCV "n_jobs" Parameter

The n_jobs parameter in RandomizedSearchCV controls the parallelization of the hyperparameter search process. Random search is a hyperparameter optimization method that tries random combinations of parameters to find the best performing model.

The n_jobs parameter determines how many CPU cores are used in parallel during the search.

The default value is 1, which means no parallelization.

Setting n_jobs to -1 will use all available cores, while setting it to a specific positive integer will use that number of cores.

As a heuristic, it’s often best to use -1 for maximum parallelization unless memory is limited, in which case a lower number of cores may be more appropriate.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
import time

# Generate a synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=5, random_state=42)

# Define a parameter distribution for RandomForestClassifier hyperparameters
param_dist = {'n_estimators': [100, 200, 300],
              'max_depth': [None, 5, 10],
              'min_samples_split': [2, 5, 10]}

# Create a base RandomForestClassifier model
rf = RandomForestClassifier(random_state=42)

# List of n_jobs values to test
n_jobs_values = [1, 2, -1]

for n_jobs in n_jobs_values:
    start_time = time.perf_counter()

    # Run RandomizedSearchCV with the current n_jobs value
    search = RandomizedSearchCV(rf, param_dist, n_iter=10, cv=5, n_jobs=n_jobs, random_state=42)
    search.fit(X, y)

    end_time = time.perf_counter()
    execution_time = end_time - start_time

    print(f"Best score for n_jobs={n_jobs}: {search.best_score_:.3f}")
    print(f"Execution time for n_jobs={n_jobs}: {execution_time:.2f} seconds")
    print()

Running the example gives an output like:

Best score for n_jobs=1: 0.939
Execution time for n_jobs=1: 17.94 seconds

Best score for n_jobs=2: 0.939
Execution time for n_jobs=2: 10.39 seconds

Best score for n_jobs=-1: 0.939
Execution time for n_jobs=-1: 6.36 seconds

The steps are as follows:

Generate a synthetic binary classification dataset using make_classification() from scikit-learn.
Define a parameter distribution dictionary param_dist for RandomForestClassifier hyperparameters.
Create a base RandomForestClassifier model rf.
Iterate over different n_jobs values (1, 2, -1).
For each n_jobs value:
- Record the start time using time.time().
- Run RandomizedSearchCV with 10 iterations and 5-fold cross-validation.
- Record the end time and calculate the execution time.
- Print the best score and execution time for the current n_jobs value.

See Also