BaggingRegressor is an ensemble meta-estimator that fits base regressors on random subsets of the original dataset and then aggregates their individual predictions to form a final prediction. This example demonstrates how to implement and evaluate BaggingRegressor on a synthetic regression dataset.
The key hyperparameters of BaggingRegressor
include the estimator
(the base regressor from which the ensemble is built), n_estimators
(the number of base estimators in the ensemble), and max_samples
(the number of samples to draw from the original dataset to train each base estimator).
The algorithm is suitable for regression problems where the goal is to predict a continuous target variable.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=4, noise=0.1, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = BaggingRegressor(estimator=DecisionTreeRegressor(), n_estimators=10, random_state=1)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)
# make a prediction
row = [[0.5, -1.2, 0.3, -0.8]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
Mean Squared Error: 1628.157
Predicted: -90.950
The steps are as follows:
First, a synthetic regression dataset is generated using the
make_regression()
function. This creates a dataset with a specified number of samples (n_samples
), features (n_features
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Next, a
BaggingRegressor
model is instantiated withDecisionTreeRegressor
as the base estimator. The ensemble is configured withn_estimators
to specify the number of trees and a random seed (random_state
) for reproducibility. The model is then fit on the training data using thefit()
method.The performance of the model is evaluated by predicting the test set and calculating the Mean Squared Error (
mean_squared_error
) between the predictions (yhat
) and actual values (y_test
).A single prediction can be made by passing a new data sample to the
predict()
method.
This example demonstrates how to set up and use BaggingRegressor
for regression tasks, showcasing its ability to improve prediction accuracy by aggregating multiple base estimators.