The n_estimators
parameter in scikit-learn’s AdaBoostRegressor
controls the number of weak learners in the ensemble.
AdaBoost (Adaptive Boosting) is an ensemble method that combines multiple weak learners, typically decision trees, to create a strong predictor. The n_estimators
parameter determines how many weak learners are sequentially trained.
Increasing n_estimators
generally improves model performance up to a point, after which returns diminish and overfitting may occur. The optimal value depends on the specific dataset and problem.
The default value for n_estimators
in AdaBoostRegressor
is 50.
In practice, values between 50 and 500 are commonly used, but this can vary widely depending on the complexity of the regression task and the characteristics of the dataset.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different n_estimators values
n_estimators_values = [10, 50, 100, 200, 500]
mse_scores = []
for n in n_estimators_values:
ada = AdaBoostRegressor(n_estimators=n, random_state=42)
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"n_estimators={n}, MSE: {mse:.4f}")
# Find best n_estimators
best_n = n_estimators_values[np.argmin(mse_scores)]
print(f"\nBest n_estimators: {best_n}")
Running the example gives an output like:
n_estimators=10, MSE: 16101.0718
n_estimators=50, MSE: 10253.6570
n_estimators=100, MSE: 9149.6702
n_estimators=200, MSE: 8349.3847
n_estimators=500, MSE: 8023.3465
Best n_estimators: 500
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
AdaBoostRegressor
models with differentn_estimators
values - Evaluate the mean squared error (MSE) of each model on the test set
- Identify the best performing
n_estimators
value
Some tips and heuristics for setting n_estimators
in AdaBoostRegressor
:
- Start with the default value of 50 and incrementally increase
- Monitor performance on a validation set to avoid overfitting
- Consider the trade-off between model performance and training time
Issues to consider:
- Higher
n_estimators
values increase computational cost - The optimal number of estimators can vary greatly depending on the dataset
- AdaBoost can be sensitive to noisy data and outliers, which may affect the optimal
n_estimators