SKLearner Home | About | Contact | Examples

Configure VotingRegressor "n_jobs" Parameter

The n_jobs parameter in scikit-learn’s VotingRegressor controls the number of CPU cores used for parallel processing during fitting and prediction.

VotingRegressor is an ensemble method that combines multiple base regressors to make predictions based on a voting strategy. It can leverage parallel processing to speed up computations when working with multiple estimators or large datasets.

The n_jobs parameter determines how many CPU cores are used for parallel execution. A value of -1 uses all available cores, 1 means no parallel processing (sequential execution), and positive integers specify the exact number of cores to use.

By default, n_jobs is set to None, which is equivalent to 1 (no parallel processing).

Common values for n_jobs include -1 (all cores), 1 (no parallelism), or a positive integer up to the number of available CPU cores on the machine.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import time

# Generate synthetic dataset
X, y = make_regression(n_samples=10000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models
base_models = [
    ('lr', LinearRegression()),
    ('svr', SVR()),
    ('rf', RandomForestRegressor(n_estimators=100, random_state=42))
]

# Train with different n_jobs values
n_jobs_values = [-1, 1, 2, 4]
results = []

for n_jobs in n_jobs_values:
    voter = VotingRegressor(estimators=base_models, n_jobs=n_jobs)

    start_time = time.time()
    voter.fit(X_train, y_train)
    fit_time = time.time() - start_time

    start_time = time.time()
    y_pred = voter.predict(X_test)
    predict_time = time.time() - start_time

    mse = mean_squared_error(y_test, y_pred)
    results.append((n_jobs, fit_time, predict_time, mse))

    print(f"n_jobs={n_jobs}, Fit time: {fit_time:.2f}s, Predict time: {predict_time:.2f}s, MSE: {mse:.4f}")

Running the example gives an output like:

n_jobs=-1, Fit time: 13.69s, Predict time: 1.42s, MSE: 2985.9528
n_jobs=1, Fit time: 14.10s, Predict time: 1.30s, MSE: 2985.9528
n_jobs=2, Fit time: 13.48s, Predict time: 1.45s, MSE: 2985.9528
n_jobs=4, Fit time: 13.88s, Predict time: 1.31s, MSE: 2985.9528

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Define base models for the VotingRegressor
  4. Train VotingRegressor models with different n_jobs values
  5. Measure fit time, prediction time, and mean squared error for each configuration

Some tips and heuristics for setting n_jobs:

Issues to consider:



See Also