SKLearner Home | About | Contact | Examples

Configure StackingRegressor "passthrough" Parameter

The passthrough parameter in scikit-learn’s StackingRegressor determines whether to include the original features alongside the predictions from the base estimators.

StackingRegressor is an ensemble method that combines multiple regression models via a meta-regressor. The passthrough parameter controls whether the original features are passed to the final estimator along with the outputs of the base estimators.

When passthrough=True, the meta-regressor receives both the original features and the base estimator predictions. This can potentially improve performance by allowing the meta-model to directly use the original features.

The default value for passthrough is False. Setting it to True can be beneficial when the original features contain information not fully captured by the base estimators.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.ensemble import StackingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base models
base_models = [
    ('rf', RandomForestRegressor(n_estimators=10, random_state=42)),
    ('svr', SVR(kernel='rbf'))
]

# Create StackingRegressor with passthrough=False
stacking_false = StackingRegressor(
    estimators=base_models,
    final_estimator=RandomForestRegressor(n_estimators=10, random_state=42),
    passthrough=False
)

# Create StackingRegressor with passthrough=True
stacking_true = StackingRegressor(
    estimators=base_models,
    final_estimator=RandomForestRegressor(n_estimators=10, random_state=42),
    passthrough=True
)

# Train and evaluate models
for model in [stacking_false, stacking_true]:
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"passthrough={model.passthrough}, MSE: {mse:.4f}")

Running the example gives an output like:

passthrough=False, MSE: 7205.7209
passthrough=True, MSE: 4680.9284

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Define base models (RandomForestRegressor and SVR)
  4. Create two StackingRegressor models, one with passthrough=False and one with passthrough=True
  5. Train both models and evaluate their performance using mean squared error

Tips and heuristics for setting passthrough:

Issues to consider:



See Also