SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "tol" Parameter

The tol parameter in scikit-learn’s HistGradientBoostingRegressor controls the tolerance for the early stopping criterion.

HistGradientBoostingRegressor is a gradient boosting algorithm that uses histogram-based decision trees. It’s designed for efficiency and can handle large datasets.

The tol parameter determines the minimum relative improvement in the loss function required to continue training. If the improvement falls below this threshold, training stops early.

The default value for tol is 1e-7. In practice, values between 1e-8 and 1e-3 are commonly used, depending on the desired trade-off between model performance and training time.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import time

# Generate synthetic dataset
X, y = make_regression(n_samples=10000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [1e-8, 1e-7, 1e-5, 1e-3]
results = []

for tol in tol_values:
    start_time = time.time()
    model = HistGradientBoostingRegressor(tol=tol, random_state=42, max_iter=1000)
    model.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    n_iter = model.n_iter_

    results.append((tol, mse, train_time, n_iter))
    print(f"tol={tol:.0e}, MSE: {mse:.4f}, Training time: {train_time:.2f}s, Iterations: {n_iter}")

Running the example gives an output like:

tol=1e-08, MSE: 729.3384, Training time: 2.75s, Iterations: 1000
tol=1e-07, MSE: 729.3384, Training time: 2.46s, Iterations: 1000
tol=1e-05, MSE: 729.3384, Training time: 2.40s, Iterations: 1000
tol=1e-03, MSE: 729.3384, Training time: 2.42s, Iterations: 1000

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different tol values
  4. Measure training time, number of iterations, and mean squared error for each model
  5. Compare the results to understand the impact of tol on model performance and training efficiency

Some tips and heuristics for setting tol:

Issues to consider:



See Also