The n_estimators parameter in scikit-learn’s AdaBoostRegressor controls the number of weak learners in the ensemble.
AdaBoost (Adaptive Boosting) is an ensemble method that combines multiple weak learners, typically decision trees, to create a strong predictor. The n_estimators parameter determines how many weak learners are sequentially trained.
Increasing n_estimators generally improves model performance up to a point, after which returns diminish and overfitting may occur. The optimal value depends on the specific dataset and problem.
The default value for n_estimators in AdaBoostRegressor is 50.
In practice, values between 50 and 500 are commonly used, but this can vary widely depending on the complexity of the regression task and the characteristics of the dataset.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different n_estimators values
n_estimators_values = [10, 50, 100, 200, 500]
mse_scores = []
for n in n_estimators_values:
ada = AdaBoostRegressor(n_estimators=n, random_state=42)
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"n_estimators={n}, MSE: {mse:.4f}")
# Find best n_estimators
best_n = n_estimators_values[np.argmin(mse_scores)]
print(f"\nBest n_estimators: {best_n}")
Running the example gives an output like:
n_estimators=10, MSE: 16101.0718
n_estimators=50, MSE: 10253.6570
n_estimators=100, MSE: 9149.6702
n_estimators=200, MSE: 8349.3847
n_estimators=500, MSE: 8023.3465
Best n_estimators: 500
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
AdaBoostRegressormodels with differentn_estimatorsvalues - Evaluate the mean squared error (MSE) of each model on the test set
- Identify the best performing
n_estimatorsvalue
Some tips and heuristics for setting n_estimators in AdaBoostRegressor:
- Start with the default value of 50 and incrementally increase
- Monitor performance on a validation set to avoid overfitting
- Consider the trade-off between model performance and training time
Issues to consider:
- Higher
n_estimatorsvalues increase computational cost - The optimal number of estimators can vary greatly depending on the dataset
- AdaBoost can be sensitive to noisy data and outliers, which may affect the optimal
n_estimators