Configure AdaBoostRegressor "estimator" Parameter

The estimator parameter in scikit-learn’s AdaBoostRegressor determines the base regressor used in the ensemble.

AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak learners to create a strong predictor. The estimator parameter specifies the type of weak learner to use as the base model.

By default, AdaBoostRegressor uses DecisionTreeRegressor with max_depth=3 as the base estimator. This default works well in many cases, but changing the base estimator can significantly impact the model’s performance and characteristics.

Common alternatives to the default include decision trees with different depths, linear models like LinearRegression, or other regressors that can be considered “weak learners”.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and evaluate AdaBoostRegressor with different base estimators
estimators = {
    'Default': None,
    'DecisionTree(max_depth=1)': DecisionTreeRegressor(max_depth=1),
    'LinearRegression': LinearRegression()
}

for name, estimator in estimators.items():
    model = AdaBoostRegressor(estimator=estimator, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"{name} - MSE: {mse:.4f}")

Running the example gives an output like:

Default - MSE: 3767.9097
DecisionTree(max_depth=1) - MSE: 6149.6108
LinearRegression - MSE: 0.0097

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Create AdaBoostRegressor instances with different base estimators
Train models and evaluate using mean squared error
Compare the performance of different base estimators

Some tips and heuristics for setting the estimator parameter:

Choose weak learners that are simple and fast to train
Balance between weak learners (e.g., shallow trees) and stronger base models
Consider the computational cost, especially for large datasets or many boosting rounds

Issues to consider:

The choice of base estimator can affect model interpretability
Complex base estimators may lead to overfitting
Different base estimators create trade-offs between bias and variance in the final model

See Also