Scikit-Learn StackingRegressor Model

Stacking Regressor is an ensemble learning method that combines multiple regression models to improve predictive performance. It works by stacking the predictions of several base regressors and using a final estimator to determine the final output. This method leverages the strengths of different models to create a more robust and accurate prediction.

The key hyperparameters of StackingRegressor include the estimators (a list of base regressors), final_estimator (the meta-regressor that combines base predictions), and cv (cross-validation splitting strategy).

The algorithm is appropriate for regression problems where combining the strengths of multiple models can yield better results.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingRegressor
from sklearn.linear_model import RidgeCV
from sklearn.tree import DecisionTreeRegressor
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# generate synthetic regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# define base models
estimators = [
    ('ridge', RidgeCV()),
    ('tree', DecisionTreeRegressor(random_state=1)),
    ('svr', SVR())
]

# define stacking model
model = StackingRegressor(estimators=estimators, final_estimator=RidgeCV())

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 0.012
Predicted: -73.716

The steps are as follows:

First, a synthetic regression dataset is generated using the make_regression() function. This creates a dataset with a specified number of samples (n_samples), noise level (noise), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and test sets using train_test_split().
Next, base regressors are defined, including RidgeCV, DecisionTreeRegressor, and SVR, and combined into a list called estimators.
A StackingRegressor is created using the defined base models, with RidgeCV as the final estimator. The model is then fit on the training data using the fit() method.
The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the mean squared error metric.
A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to set up and use a StackingRegressor in scikit-learn to combine multiple regression models for enhanced predictive performance in regression tasks. The StackingRegressor effectively leverages the strengths of different algorithms, potentially leading to better model accuracy and robustness.

See Also