The estimator
parameter in scikit-learn’s AdaBoostClassifier
specifies the base learner used in the ensemble.
AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak learners to create a strong classifier. The estimator
parameter determines the type of weak learner used in the ensemble.
By default, AdaBoostClassifier
uses DecisionTreeClassifier
with a maximum depth of 1 (decision stumps) as the base estimator. However, you can use any classifier that supports sample weighting as the base estimator.
Common choices for the estimator
parameter include decision trees with varying depths, logistic regression, and support vector machines.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and evaluate AdaBoostClassifier with different base estimators
estimators = [
("Default", None),
("DecisionTree(max_depth=3)", DecisionTreeClassifier(max_depth=3)),
("LogisticRegression", LogisticRegression())
]
for name, estimator in estimators:
ada = AdaBoostClassifier(estimator=estimator, random_state=42)
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Base estimator: {name}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
...
The key steps in this example are:
- Generate a synthetic classification dataset with informative and redundant features
- Split the data into train and test sets
- Create
AdaBoostClassifier
instances with different base estimators - Train the models and evaluate their accuracy on the test set
Some tips for configuring the estimator
parameter:
- Use simple models as base estimators to prevent overfitting
- Experiment with different estimator types to find the best fit for your data
- Adjust the hyperparameters of the base estimator to optimize performance
- Consider the trade-off between model complexity and computational cost
Issues to consider:
- The choice of base estimator can significantly impact the ensemble’s performance
- Some estimators may not be suitable for AdaBoost if they don’t support sample weighting
- The optimal base estimator depends on the specific characteristics of your dataset
- Increasing the complexity of the base estimator may lead to overfitting