SKLearner Home | About | Contact | Examples

Configure ExtraTreesClassifier "n_jobs" Parameter

The n_jobs parameter in scikit-learn’s ExtraTreesClassifier controls the number of parallel jobs to run for both fitting and prediction.

Extra Trees Classifier is an ensemble method that builds multiple decision trees and combines their predictions. The n_jobs parameter determines how many processors are used to build trees in parallel.

Setting n_jobs to a value greater than 1 can significantly speed up training and prediction times, especially for large datasets or when building many trees. However, it may not always lead to faster execution due to overhead.

The default value for n_jobs is None, which means it will use 1 processor. Setting it to -1 uses all available processors.

Common values for n_jobs include 1 (no parallelism), -1 (all processors), or a specific number based on available CPU cores (e.g., 2, 4, or 8).

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_jobs values
n_jobs_values = [1, 2, 4, -1]
results = []

for n_jobs in n_jobs_values:
    start_time = time.time()
    etc = ExtraTreesClassifier(n_estimators=100, random_state=42, n_jobs=n_jobs)
    etc.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = etc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    results.append((n_jobs, train_time, accuracy))
    print(f"n_jobs={n_jobs}, Training Time: {train_time:.2f}s, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

n_jobs=1, Training Time: 0.58s, Accuracy: 0.939
n_jobs=2, Training Time: 0.36s, Accuracy: 0.939
n_jobs=4, Training Time: 0.23s, Accuracy: 0.939
n_jobs=-1, Training Time: 0.20s, Accuracy: 0.939

The key steps in this example are:

  1. Generate a synthetic classification dataset with informative and redundant features
  2. Split the data into train and test sets
  3. Train ExtraTreesClassifier models with different n_jobs values
  4. Measure training time and accuracy for each model
  5. Compare the results to see the effect of parallelization

Some tips and heuristics for setting n_jobs:

Issues to consider:



See Also