Configure AdaBoostClassifier "algorithm" Parameter

The algorithm parameter in scikit-learn’s AdaBoostClassifier determines which boosting algorithm to use.

AdaBoost (Adaptive Boosting) is an ensemble method that combines weak learners sequentially, giving more weight to misclassified samples in each iteration.

The algorithm parameter allows choosing between ‘SAMME’ (discrete boosting) and ‘SAMME.R’ (real boosting). SAMME.R generally performs better but requires estimators to provide probability estimates.

The default value for algorithm is ‘SAMME.R’.

In practice, ‘SAMME.R’ is often preferred when using decision trees with max_depth > 1, while ‘SAMME’ can be useful with very shallow trees or other base estimators.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base estimator
base_estimator = DecisionTreeClassifier(max_depth=1)

# Train with different algorithm values
algorithms = ['SAMME', 'SAMME.R']
for alg in algorithms:
    start_time = time.time()
    ada = AdaBoostClassifier(base_estimator=base_estimator, algorithm=alg, random_state=42)
    ada.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = ada.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    auc = roc_auc_score(y_test, ada.predict_proba(X_test)[:, 1])

    print(f"Algorithm: {alg}")
    print(f"Accuracy: {accuracy:.3f}")
    print(f"ROC AUC: {auc:.3f}")
    print(f"Training time: {train_time:.3f} seconds\n")

Running the example gives an output like:

Algorithm: SAMME
Accuracy: 0.825
ROC AUC: 0.904
Training time: 0.141 seconds

Algorithm: SAMME.R
Accuracy: 0.825
ROC AUC: 0.888
Training time: 0.145 seconds

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train AdaBoostClassifier models with both ‘SAMME’ and ‘SAMME.R’ algorithms
Evaluate the accuracy, ROC AUC score, and training time for each model

Some tips and heuristics for choosing the algorithm:

Use ‘SAMME.R’ when base estimators can provide probability estimates
Consider ‘SAMME’ for very shallow trees (stumps) or non-tree base estimators
‘SAMME.R’ often converges faster and may require fewer estimators

Issues to consider:

‘SAMME.R’ requires probability estimates, which not all base estimators provide
The performance difference between algorithms can vary based on the dataset and base estimator
‘SAMME’ might be more robust in some cases, especially with noisy data or weak base estimators

See Also