SKLearner Home | About | Contact | Examples

Configure AdaBoostClassifier "algorithm" Parameter

The algorithm parameter in scikit-learn’s AdaBoostClassifier determines which boosting algorithm to use.

AdaBoost (Adaptive Boosting) is an ensemble method that combines weak learners sequentially, giving more weight to misclassified samples in each iteration.

The algorithm parameter allows choosing between ‘SAMME’ (discrete boosting) and ‘SAMME.R’ (real boosting). SAMME.R generally performs better but requires estimators to provide probability estimates.

The default value for algorithm is ‘SAMME.R’.

In practice, ‘SAMME.R’ is often preferred when using decision trees with max_depth > 1, while ‘SAMME’ can be useful with very shallow trees or other base estimators.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, roc_auc_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Base estimator
base_estimator = DecisionTreeClassifier(max_depth=1)

# Train with different algorithm values
algorithms = ['SAMME', 'SAMME.R']
for alg in algorithms:
    start_time = time.time()
    ada = AdaBoostClassifier(base_estimator=base_estimator, algorithm=alg, random_state=42)
    ada.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = ada.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    auc = roc_auc_score(y_test, ada.predict_proba(X_test)[:, 1])

    print(f"Algorithm: {alg}")
    print(f"Accuracy: {accuracy:.3f}")
    print(f"ROC AUC: {auc:.3f}")
    print(f"Training time: {train_time:.3f} seconds\n")

Running the example gives an output like:

Algorithm: SAMME
Accuracy: 0.825
ROC AUC: 0.904
Training time: 0.141 seconds

Algorithm: SAMME.R
Accuracy: 0.825
ROC AUC: 0.888
Training time: 0.145 seconds

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train AdaBoostClassifier models with both ‘SAMME’ and ‘SAMME.R’ algorithms
  4. Evaluate the accuracy, ROC AUC score, and training time for each model

Some tips and heuristics for choosing the algorithm:

Issues to consider:



See Also