SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "tol" Parameter

The tol parameter in GradientBoostingRegressor controls the tolerance for the stopping criteria.

Gradient Boosting is a machine learning technique for regression and classification problems, which builds a model in a stage-wise fashion from weak learners. The tol parameter sets the tolerance for the improvement of the loss function. If the improvement is less than tol for a specified number of iterations, the training stops.

The default value for tol is 1e-4. Common values range from 1e-3 to 1e-6, depending on the desired precision and computational resources.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [1e-3, 1e-4, 1e-5, 1e-6]
mse_scores = []

for tol in tol_values:
    gbr = GradientBoostingRegressor(tol=tol, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"tol={tol}, MSE: {mse:.4f}")

Running the example gives an output like:

tol=0.001, MSE: 3052.3752
tol=0.0001, MSE: 3052.3752
tol=1e-05, MSE: 3052.3752
tol=1e-06, MSE: 3052.3752

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train GradientBoostingRegressor models with different tol values
  4. Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting tol:

Issues to consider:



See Also