Comparing StackingRegressor and VotingRegressor for Ensemble Learning in Regression Tasks

StackingRegressor combines multiple regression models using a meta-regressor to improve predictive performance. Key hyperparameters include estimators (list of base regressors), final_estimator (meta-regressor), and cv (cross-validation splitting strategy).

VotingRegressor aggregates predictions from multiple regression models by averaging their predictions. Key hyperparameters include estimators (list of base regressors) and weights (optional weights for each regressor).

StackingRegressor leverages a second-level model to learn from the predictions of base models, potentially yielding better performance by capturing interactions between predictions.

VotingRegressor is simpler, averaging predictions without learning interactions, making it faster and easier to implement but possibly less powerful.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import StackingRegressor, VotingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base regressors
base_regressors = [
    ('lr', LinearRegression()),
    ('dt', DecisionTreeRegressor(random_state=42))
]

# Fit and evaluate VotingRegressor
voting_reg = VotingRegressor(estimators=base_regressors)
voting_reg.fit(X_train, y_train)
y_pred_voting = voting_reg.predict(X_test)
print(f"VotingRegressor MSE: {mean_squared_error(y_test, y_pred_voting):.3f}")

# Fit and evaluate StackingRegressor
stacking_reg = StackingRegressor(estimators=base_regressors, final_estimator=LinearRegression())
stacking_reg.fit(X_train, y_train)
y_pred_stacking = stacking_reg.predict(X_test)
print(f"StackingRegressor MSE: {mean_squared_error(y_test, y_pred_stacking):.3f}")

Running the example gives an output like:

VotingRegressor MSE: 5130.514
StackingRegressor MSE: 0.011

The steps are as follows:

Generate a synthetic regression dataset using make_regression.
Split the data into training and test sets using train_test_split.
Define two base regressors: LinearRegression and DecisionTreeRegressor.
Instantiate VotingRegressor with the base regressors, fit it on the training data, and evaluate its performance on the test set.
Instantiate StackingRegressor with the base regressors and LinearRegression as the meta-regressor, fit it on the training data, and evaluate its performance on the test set.
Compare the mean squared error (MSE) of both models on the test set to observe the differences in performance.

Scikit-Learn "StackingRegressor" versus "VotingRegressor"

Comparing StackingRegressor and VotingRegressor for Ensemble Learning in Regression Tasks

See Also