Configure HistGradientBoostingClassifier "max_iter" Parameter

The max_iter parameter in scikit-learn’s HistGradientBoostingClassifier controls the maximum number of iterations performed by the algorithm.

HistGradientBoostingClassifier is a gradient boosting algorithm that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets with high-dimensional features.

The max_iter parameter determines the maximum number of boosting iterations. Each iteration adds a new tree to the ensemble, potentially improving the model’s performance.

The default value for max_iter is 100. In practice, values between 50 and 1000 are commonly used, depending on the dataset’s complexity and the desired trade-off between performance and training time.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_iter values
max_iter_values = [50, 100, 500, 1000]
results = []

for max_iter in max_iter_values:
    start_time = time.time()
    hgbc = HistGradientBoostingClassifier(max_iter=max_iter, random_state=42)
    hgbc.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = hgbc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    results.append((max_iter, accuracy, train_time))
    print(f"max_iter={max_iter}, Accuracy: {accuracy:.3f}, Training time: {train_time:.2f}s")

# Find best performing model
best_model = max(results, key=lambda x: x[1])
print(f"\nBest model: max_iter={best_model[0]}, Accuracy: {best_model[1]:.3f}")

Running the example gives an output like:

max_iter=50, Accuracy: 0.905, Training time: 0.47s
max_iter=100, Accuracy: 0.912, Training time: 0.88s
max_iter=500, Accuracy: 0.921, Training time: 4.31s
max_iter=1000, Accuracy: 0.920, Training time: 6.90s

Best model: max_iter=500, Accuracy: 0.921

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train HistGradientBoostingClassifier models with different max_iter values
Measure accuracy and training time for each model
Identify the best performing model based on accuracy

Tips for setting max_iter:

Start with the default value of 100 and adjust based on model performance
Monitor both accuracy and training time to find an optimal balance
Use early stopping with validation data to prevent overfitting

Considerations when adjusting max_iter:

Larger values may improve performance but increase training time
Too many iterations can lead to overfitting, especially on small datasets
The optimal number of iterations depends on the dataset’s complexity and size

See Also