SKLearner Home | About | Contact | Examples

Configure GradientBoostingClassifier "tol" Parameter

Configuring the tol parameter in GradientBoostingClassifier controls the tolerance for the stopping criterion based on the loss improvement.

Gradient Boosting is an ensemble technique that builds models sequentially, with each model trying to correct the errors of the previous ones. The tol parameter specifies the minimum improvement in the loss function required to continue training.

The tol parameter sets a threshold to determine when the training should stop if the improvement in the loss function falls below this threshold.

The default value for tol is 1e-4. Common values used range from 1e-4 to 1e-2, depending on the desired trade-off between training time and model performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [1e-4, 1e-3, 1e-2]
accuracies = []

for tol in tol_values:
    gb = GradientBoostingClassifier(tol=tol, random_state=42)
    gb.fit(X_train, y_train)
    y_pred = gb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"tol={tol}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

tol=0.0001, Accuracy: 0.880
tol=0.001, Accuracy: 0.880
tol=0.01, Accuracy: 0.880

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features
  2. Split the data into training and test sets
  3. Train GradientBoostingClassifier models with different tol values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting tol:

Issues to consider:



See Also