SKLearner Home | About | Contact | Examples

Configure BaggingRegressor "estimator" Parameter

The estimator parameter in scikit-learn’s BaggingRegressor determines the base model used in the ensemble.

Bagging (Bootstrap Aggregating) is an ensemble method that creates multiple subsets of the training data, trains a separate model on each subset, and combines their predictions. The estimator parameter specifies the type of model to use as the base learner in this ensemble.

By default, BaggingRegressor uses DecisionTreeRegressor with max_features=1 as the base estimator. However, you can use any regressor that follows the scikit-learn estimator API.

Common choices for the base estimator include DecisionTreeRegressor, LinearRegression, and SVR (Support Vector Regressor).

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base estimators to compare
estimators = [
    ('Default', None),
    ('DecisionTree', DecisionTreeRegressor(random_state=42)),
    ('LinearRegression', LinearRegression()),
    ('SVR', SVR())
]

# Train and evaluate BaggingRegressor with different base estimators
for name, estimator in estimators:
    bagging = BaggingRegressor(estimator=estimator, random_state=42)
    bagging.fit(X_train, y_train)
    y_pred = bagging.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Estimator: {name}, MSE: {mse:.4f}")

Running the example gives an output like:

Estimator: Default, MSE: 7486.4813
Estimator: DecisionTree, MSE: 7486.4813
Estimator: LinearRegression, MSE: 0.0113
Estimator: SVR, MSE: 35251.5928

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Define a list of base estimators to compare
  4. Train BaggingRegressor models with different base estimators
  5. Evaluate each model’s performance using Mean Squared Error

Some tips for choosing the base estimator:

Issues to consider:



See Also