Configure AdaBoostClassifier "n_estimators" Parameter

The n_estimators parameter in scikit-learn’s AdaBoostClassifier controls the number of weak learners in the ensemble.

AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak learners, typically decision trees, to create a strong classifier. The n_estimators parameter determines how many weak learners are created and combined.

Increasing the number of estimators generally improves the model’s performance by reducing bias. However, it can lead to overfitting and increased computational cost if set too high.

The default value for n_estimators in AdaBoostClassifier is 50.

In practice, values between 50 and 500 are commonly used, depending on the complexity of the problem and the desired trade-off between performance and computational cost.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_estimators values
n_estimators_values = [10, 50, 100, 200, 500]
accuracies = []

for n in n_estimators_values:
    ada = AdaBoostClassifier(n_estimators=n, random_state=42)
    ada.fit(X_train, y_train)
    y_pred = ada.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"n_estimators={n}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

n_estimators=10, Accuracy: 0.790
n_estimators=50, Accuracy: 0.830
n_estimators=100, Accuracy: 0.820
n_estimators=200, Accuracy: 0.830
n_estimators=500, Accuracy: 0.820

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and redundant features
Split the data into train and test sets
Train AdaBoostClassifier models with different n_estimators values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting n_estimators:

Start with the default value of 50 and increase it until performance plateaus or overfitting occurs
Use cross-validation to find the optimal number of estimators for your specific dataset
Consider the trade-off between model performance and training time when selecting n_estimators

Issues to consider:

Increasing n_estimators generally improves performance but may lead to overfitting on small datasets
The optimal number of estimators depends on the complexity of the problem and the quality of the weak learners
AdaBoost can be sensitive to noisy data and outliers, so data preprocessing may be necessary
Monitor both training and validation performance to detect overfitting as you increase n_estimators

See Also