SKLearner Home | About | Contact | Examples

Configure VotingRegressor "estimators" Parameter

The estimators parameter in scikit-learn’s VotingRegressor defines the set of base models used in the ensemble.

VotingRegressor combines multiple regression models to improve predictions by averaging their individual outputs. The estimators parameter specifies which models to include in the ensemble.

Selecting diverse and complementary estimators is crucial for maximizing the ensemble’s performance. The goal is to leverage the strengths of different models while mitigating their individual weaknesses.

There is no default value for estimators; it must be explicitly specified when creating a VotingRegressor.

Common values include combinations of different regression models such as LinearRegression, RandomForestRegressor, SVR, and GradientBoostingRegressor.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, VotingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models
lr = LinearRegression()
rf = RandomForestRegressor(n_estimators=100, random_state=42)
svr = SVR(kernel='rbf')

# Create VotingRegressor with different estimator combinations
estimator_combinations = [
    [('lr', lr), ('rf', rf)],
    [('lr', lr), ('svr', svr)],
    [('rf', rf), ('svr', svr)],
    [('lr', lr), ('rf', rf), ('svr', svr)]
]

for i, estimators in enumerate(estimator_combinations):
    vr = VotingRegressor(estimators=estimators)
    vr.fit(X_train, y_train)
    y_pred = vr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Combination {i+1}: {[e[0] for e in estimators]}")
    print(f"Mean Squared Error: {mse:.4f}")
    print()

Running the example gives an output like:

Combination 1: ['lr', 'rf']
Mean Squared Error: 1764.3715

Combination 2: ['lr', 'svr']
Mean Squared Error: 8797.5941

Combination 3: ['rf', 'svr']
Mean Squared Error: 16628.2346

Combination 4: ['lr', 'rf', 'svr']
Mean Squared Error: 7390.2678

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Define individual regression models (LinearRegression, RandomForestRegressor, SVR)
  4. Create VotingRegressor instances with different estimator combinations
  5. Train each VotingRegressor and evaluate its performance using mean squared error

Tips for configuring the estimators parameter:

Issues to consider:



See Also