SKLearner Home | About | Contact | Examples

Configure StackingRegressor "final_estimator" Parameter

The final_estimator parameter in scikit-learn’s StackingRegressor determines the model used to combine predictions from base estimators.

Stacking is an ensemble method that trains multiple base models and a meta-model (final estimator) to combine their predictions. The final_estimator is crucial as it learns how to best integrate the base models’ outputs.

By default, StackingRegressor uses RidgeCV as the final estimator. Common alternatives include LinearRegression, RandomForestRegressor, or other models capable of handling the base estimators’ outputs.

The choice of final_estimator can significantly impact the ensemble’s performance, especially when base estimators have complementary strengths.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, StackingRegressor
from sklearn.linear_model import LinearRegression, RidgeCV
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base estimators
base_estimators = [
    ('rf', RandomForestRegressor(n_estimators=10, random_state=42)),
    ('gb', GradientBoostingRegressor(n_estimators=10, random_state=42))
]

# Create StackingRegressors with different final estimators
final_estimators = {
    'Default (RidgeCV)': None,
    'LinearRegression': LinearRegression(),
    'RandomForestRegressor': RandomForestRegressor(n_estimators=10, random_state=42)
}

for name, final_estimator in final_estimators.items():
    stack = StackingRegressor(estimators=base_estimators, final_estimator=final_estimator, cv=5)
    stack.fit(X_train, y_train)
    y_pred = stack.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Final Estimator: {name}, MSE: {mse:.4f}")

Running the example gives an output like:

Final Estimator: Default (RidgeCV), MSE: 3029.6674
Final Estimator: LinearRegression, MSE: 3029.6599
Final Estimator: RandomForestRegressor, MSE: 3644.4075

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Define base estimators (RandomForestRegressor and GradientBoostingRegressor)
  4. Create StackingRegressor models with different final estimators
  5. Train each model and evaluate its performance using mean squared error

Tips for choosing and configuring the final_estimator:

Issues to consider:



See Also