SKLearner Home | About | Contact | Examples

Configure VotingClassifier "n_jobs" Parameter

The n_jobs parameter in scikit-learn’s VotingClassifier controls the number of CPU cores used for parallel processing during fitting and prediction.

VotingClassifier is an ensemble method that combines predictions from multiple base classifiers. It supports both hard voting (majority vote) and soft voting (weighted probabilities).

The n_jobs parameter determines how many cores are used for parallel computation. Setting it to -1 uses all available cores, while positive integers specify the exact number of cores to use.

The default value for n_jobs is None, which means it uses a single core. Common values include -1 (all cores), 1 (single core), or the number of available cores on the machine.

In practice, the optimal value depends on the complexity of the base estimators, the size of the dataset, and the available hardware resources.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create base classifiers
rf = RandomForestClassifier(n_estimators=100, random_state=42)
lr = LogisticRegression(random_state=42)
svc = SVC(probability=True, random_state=42)

# Train with different n_jobs values
n_jobs_values = [-1, 1, 2, 4]
results = []

for n_jobs in n_jobs_values:
    vc = VotingClassifier(
        estimators=[('rf', rf), ('lr', lr), ('svc', svc)],
        voting='soft',
        n_jobs=n_jobs
    )

    start_time = time.time()
    vc.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = vc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    results.append((n_jobs, train_time, accuracy))
    print(f"n_jobs={n_jobs}, Training Time: {train_time:.2f}s, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

n_jobs=-1, Training Time: 6.78s, Accuracy: 0.931
n_jobs=1, Training Time: 8.73s, Accuracy: 0.931
n_jobs=2, Training Time: 7.19s, Accuracy: 0.931
n_jobs=4, Training Time: 6.63s, Accuracy: 0.931

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Create a VotingClassifier with multiple base estimators
  4. Train and evaluate models with different n_jobs values
  5. Measure and compare training time and accuracy for each configuration

Some tips and heuristics for setting n_jobs:

Issues to consider:



See Also