Configure AdaBoostClassifier "estimator" Parameter

The estimator parameter in scikit-learn’s AdaBoostClassifier specifies the base learner used in the ensemble.

AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak learners to create a strong classifier. The estimator parameter determines the type of weak learner used in the ensemble.

By default, AdaBoostClassifier uses DecisionTreeClassifier with a maximum depth of 1 (decision stumps) as the base estimator. However, you can use any classifier that supports sample weighting as the base estimator.

Common choices for the estimator parameter include decision trees with varying depths, logistic regression, and support vector machines.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and evaluate AdaBoostClassifier with different base estimators
estimators = [
    ("Default", None),
    ("DecisionTree(max_depth=3)", DecisionTreeClassifier(max_depth=3)),
    ("LogisticRegression", LogisticRegression())
]

for name, estimator in estimators:
    ada = AdaBoostClassifier(estimator=estimator, random_state=42)
    ada.fit(X_train, y_train)
    y_pred = ada.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Base estimator: {name}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

...

The key steps in this example are:

Generate a synthetic classification dataset with informative and redundant features
Split the data into train and test sets
Create AdaBoostClassifier instances with different base estimators
Train the models and evaluate their accuracy on the test set

Some tips for configuring the estimator parameter:

Use simple models as base estimators to prevent overfitting
Experiment with different estimator types to find the best fit for your data
Adjust the hyperparameters of the base estimator to optimize performance
Consider the trade-off between model complexity and computational cost

Issues to consider:

The choice of base estimator can significantly impact the ensemble’s performance
Some estimators may not be suitable for AdaBoost if they don’t support sample weighting
The optimal base estimator depends on the specific characteristics of your dataset
Increasing the complexity of the base estimator may lead to overfitting

See Also