Comparing StackingRegressor and VotingRegressor for Ensemble Learning in Regression Tasks
StackingRegressor combines multiple regression models using a meta-regressor to improve predictive performance. Key hyperparameters include estimators
(list of base regressors), final_estimator
(meta-regressor), and cv
(cross-validation splitting strategy).
VotingRegressor aggregates predictions from multiple regression models by averaging their predictions. Key hyperparameters include estimators
(list of base regressors) and weights
(optional weights for each regressor).
StackingRegressor leverages a second-level model to learn from the predictions of base models, potentially yielding better performance by capturing interactions between predictions.
VotingRegressor is simpler, averaging predictions without learning interactions, making it faster and easier to implement but possibly less powerful.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import StackingRegressor, VotingRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define base regressors
base_regressors = [
('lr', LinearRegression()),
('dt', DecisionTreeRegressor(random_state=42))
]
# Fit and evaluate VotingRegressor
voting_reg = VotingRegressor(estimators=base_regressors)
voting_reg.fit(X_train, y_train)
y_pred_voting = voting_reg.predict(X_test)
print(f"VotingRegressor MSE: {mean_squared_error(y_test, y_pred_voting):.3f}")
# Fit and evaluate StackingRegressor
stacking_reg = StackingRegressor(estimators=base_regressors, final_estimator=LinearRegression())
stacking_reg.fit(X_train, y_train)
y_pred_stacking = stacking_reg.predict(X_test)
print(f"StackingRegressor MSE: {mean_squared_error(y_test, y_pred_stacking):.3f}")
Running the example gives an output like:
VotingRegressor MSE: 5130.514
StackingRegressor MSE: 0.011
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression
. - Split the data into training and test sets using
train_test_split
. - Define two base regressors:
LinearRegression
andDecisionTreeRegressor
. - Instantiate
VotingRegressor
with the base regressors, fit it on the training data, and evaluate its performance on the test set. - Instantiate
StackingRegressor
with the base regressors andLinearRegression
as the meta-regressor, fit it on the training data, and evaluate its performance on the test set. - Compare the mean squared error (MSE) of both models on the test set to observe the differences in performance.