Scikit-Learn BaggingRegressor Model

BaggingRegressor is an ensemble meta-estimator that fits base regressors on random subsets of the original dataset and then aggregates their individual predictions to form a final prediction. This example demonstrates how to implement and evaluate BaggingRegressor on a synthetic regression dataset.

The key hyperparameters of BaggingRegressor include the estimator (the base regressor from which the ensemble is built), n_estimators (the number of base estimators in the ensemble), and max_samples (the number of samples to draw from the original dataset to train each base estimator).

The algorithm is suitable for regression problems where the goal is to predict a continuous target variable.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=4, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = BaggingRegressor(estimator=DecisionTreeRegressor(), n_estimators=10, random_state=1)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[0.5, -1.2, 0.3, -0.8]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 1628.157
Predicted: -90.950

The steps are as follows:

First, a synthetic regression dataset is generated using the make_regression() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and test sets using train_test_split().
Next, a BaggingRegressor model is instantiated with DecisionTreeRegressor as the base estimator. The ensemble is configured with n_estimators to specify the number of trees and a random seed (random_state) for reproducibility. The model is then fit on the training data using the fit() method.
The performance of the model is evaluated by predicting the test set and calculating the Mean Squared Error (mean_squared_error) between the predictions (yhat) and actual values (y_test).
A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to set up and use BaggingRegressor for regression tasks, showcasing its ability to improve prediction accuracy by aggregating multiple base estimators.

See Also