SKLearner Home | About | Contact | Examples

Configure BaggingClassifier "n_jobs" Parameter

The n_jobs parameter in scikit-learn’s BaggingClassifier controls the number of jobs to run in parallel for both fit and predict methods.

BaggingClassifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregates their individual predictions to form a final prediction. This technique helps reduce overfitting and improves the stability and accuracy of machine learning algorithms.

The n_jobs parameter determines how many processors are used to fit and predict. A value of -1 means using all processors, while a value of 1 means using a single processor.

The default value for n_jobs is None, which means using one processor. Common values include -1 (all processors), 1 (single processor), or specific numbers like 2, 4, or 8, depending on the available hardware.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=15,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_jobs values
n_jobs_values = [1, 2, 4, -1]
accuracies = []
training_times = []

for n_jobs in n_jobs_values:
    start_time = time.time()
    bc = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10,
                           n_jobs=n_jobs, random_state=42)
    bc.fit(X_train, y_train)
    training_time = time.time() - start_time

    y_pred = bc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    accuracies.append(accuracy)
    training_times.append(training_time)
    print(f"n_jobs={n_jobs}, Accuracy: {accuracy:.3f}, Training time: {training_time:.2f} seconds")

Running the example gives an output like:

n_jobs=1, Accuracy: 0.896, Training time: 1.30 seconds
n_jobs=2, Accuracy: 0.896, Training time: 1.60 seconds
n_jobs=4, Accuracy: 0.896, Training time: 1.67 seconds
n_jobs=-1, Accuracy: 0.896, Training time: 2.12 seconds

The key steps in this example are:

  1. Generate a synthetic classification dataset
  2. Split the data into train and test sets
  3. Train BaggingClassifier models with different n_jobs values
  4. Measure and compare both accuracy and training time for each model

Tips and heuristics for setting n_jobs:

Issues to consider:



See Also