SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "n_jobs" Parameter

The n_jobs parameter in scikit-learn’s RandomForestClassifier controls the number of jobs to run in parallel for both fit and predict.

Random Forest is an ensemble learning method that trains multiple decision trees and combines their predictions to improve generalization performance.

The n_jobs parameter determines the number of CPU cores used for parallel processing. Setting it to -1 uses all available cores, while a positive integer specifies the exact number of jobs to run in parallel.

The default value for n_jobs is -1, which is commonly used to ensure all cores are used.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_jobs values
n_jobs_values = [-1, 1, 2, 4]
accuracies = []
times = []

for n in n_jobs_values:
    start = time.time()
    rf = RandomForestClassifier(n_estimators=100, n_jobs=n, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    end = time.time()
    runtime = end - start
    accuracies.append(accuracy)
    times.append(runtime)
    print(f"n_jobs={n}, Accuracy: {accuracy:.3f}, Runtime: {runtime:.2f} seconds")

Running this example gives an output like:

n_jobs=-1, Accuracy: 0.931, Runtime: 0.64 seconds
n_jobs=1, Accuracy: 0.931, Runtime: 2.51 seconds
n_jobs=2, Accuracy: 0.931, Runtime: 1.34 seconds
n_jobs=4, Accuracy: 0.931, Runtime: 0.78 seconds

The key steps in this example are:

  1. Generate a large synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train RandomForestClassifier models with different n_jobs values
  4. Evaluate the accuracy and runtime of each model

Some tips and heuristics for setting n_jobs:

Issues to consider:



See Also