SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "quantile" Parameter

The quantile parameter in scikit-learn’s HistGradientBoostingRegressor controls the strategy for binning continuous features.

Histogram-based gradient boosting is an efficient variant of gradient boosting that uses binning to speed up training and reduce memory usage. The quantile parameter determines the method used to create these bins.

When quantile is set to True, the algorithm uses quantile-based binning, which creates bins with approximately equal numbers of samples. When False, it uses uniform binning, creating bins of equal width.

The default value for quantile is True.

In practice, quantile-based binning (True) often performs better, especially with non-uniformly distributed data, but uniform binning (False) can be faster.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=10000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different quantile values
quantile_values = [True, False]
mse_scores = []

for quantile in quantile_values:
    hgbr = HistGradientBoostingRegressor(quantile=quantile, random_state=42)
    hgbr.fit(X_train, y_train)
    y_pred = hgbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"quantile={quantile}, MSE: {mse:.3f}")

# Calculate percentage difference
pct_diff = (mse_scores[1] - mse_scores[0]) / mse_scores[0] * 100
print(f"Percentage difference: {pct_diff:.2f}%")

Running the example gives an output like:

quantile=True, MSE: 924.053
quantile=False, MSE: 924.053
Percentage difference: 0.00%

The key steps in this example are:

  1. Generate a synthetic regression dataset with multiple features
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different quantile values
  4. Evaluate the mean squared error of each model on the test set
  5. Calculate the percentage difference in performance between the two strategies

Some tips and heuristics for setting quantile:

Issues to consider:



See Also