Gradient boosting is a powerful technique for regression tasks. In scikit-learn, GradientBoostingRegressor
and HistGradientBoostingRegressor
offer robust implementations of this technique, each with distinct advantages.
GradientBoostingRegressor
uses traditional boosting techniques. Key hyperparameters include n_estimators
(number of boosting stages), learning_rate
(shrinkage factor), and max_depth
(maximum depth of the individual regression estimators). This model is well-suited for smaller datasets where detailed parameter tuning can yield significant performance gains.
On the other hand, HistGradientBoostingRegressor
employs histogram-based boosting, which is designed for efficiency and scalability. Key hyperparameters include max_iter
(number of boosting iterations), learning_rate
(shrinkage factor), and max_depth
(maximum depth of the individual regression estimators). This model excels on larger datasets due to its faster computation.
The primary difference between the two is their computational efficiency. HistGradientBoostingRegressor
is optimized for speed and can handle larger datasets more effectively, while GradientBoostingRegressor
allows for more fine-tuned parameter adjustments on smaller datasets.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, HistGradientBoostingRegressor
from sklearn.metrics import r2_score
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit and evaluate GradientBoostingRegressor with default hyperparameters
gbr = GradientBoostingRegressor(random_state=42)
gbr.fit(X_train, y_train)
y_pred_gbr = gbr.predict(X_test)
print(f"GradientBoostingRegressor R^2 score: {r2_score(y_test, y_pred_gbr):.3f}")
# Fit and evaluate HistGradientBoostingRegressor with default hyperparameters
hgb = HistGradientBoostingRegressor(random_state=42)
hgb.fit(X_train, y_train)
y_pred_hgb = hgb.predict(X_test)
print(f"\nHistGradientBoostingRegressor R^2 score: {r2_score(y_test, y_pred_hgb):.3f}")
Running the example gives an output like:
GradientBoostingRegressor R^2 score: 0.921
HistGradientBoostingRegressor R^2 score: 0.921
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression
. - Split the data into training and test sets using
train_test_split
. - Instantiate
GradientBoostingRegressor
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Instantiate
HistGradientBoostingRegressor
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Compare the test set performance (R^2 score) of both models and discuss their computational efficiency.