SKLearner Home | About | Contact | Examples

Configure AdaBoostRegressor "n_estimators" Parameter

The n_estimators parameter in scikit-learn’s AdaBoostRegressor controls the number of weak learners in the ensemble.

AdaBoost (Adaptive Boosting) is an ensemble method that combines multiple weak learners, typically decision trees, to create a strong predictor. The n_estimators parameter determines how many weak learners are sequentially trained.

Increasing n_estimators generally improves model performance up to a point, after which returns diminish and overfitting may occur. The optimal value depends on the specific dataset and problem.

The default value for n_estimators in AdaBoostRegressor is 50.

In practice, values between 50 and 500 are commonly used, but this can vary widely depending on the complexity of the regression task and the characteristics of the dataset.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_estimators values
n_estimators_values = [10, 50, 100, 200, 500]
mse_scores = []

for n in n_estimators_values:
    ada = AdaBoostRegressor(n_estimators=n, random_state=42)
    ada.fit(X_train, y_train)
    y_pred = ada.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"n_estimators={n}, MSE: {mse:.4f}")

# Find best n_estimators
best_n = n_estimators_values[np.argmin(mse_scores)]
print(f"\nBest n_estimators: {best_n}")

Running the example gives an output like:

n_estimators=10, MSE: 16101.0718
n_estimators=50, MSE: 10253.6570
n_estimators=100, MSE: 9149.6702
n_estimators=200, MSE: 8349.3847
n_estimators=500, MSE: 8023.3465

Best n_estimators: 500

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train AdaBoostRegressor models with different n_estimators values
  4. Evaluate the mean squared error (MSE) of each model on the test set
  5. Identify the best performing n_estimators value

Some tips and heuristics for setting n_estimators in AdaBoostRegressor:

Issues to consider:



See Also