Gradient Boosting is a powerful ensemble technique for regression problems, combining multiple weak learners to create a strong predictive model. GradientBoostingRegressor
in scikit-learn is used to predict continuous target variables by correcting errors of sequential models.
The key hyperparameters of GradientBoostingRegressor
include n_estimators
(the number of boosting stages), learning_rate
(which shrinks the contribution of each tree), and max_depth
(the maximum depth of the individual decision trees). This algorithm is suitable for regression problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
# generate synthetic regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=1)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)
# make a prediction
row = [[-0.806091, 0.625232, -0.603220, -0.211913, -0.823170]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
Mean Squared Error: 652.478
Predicted: -33.985
The steps are as follows:
Generate a synthetic regression dataset using the
make_regression()
function. This creates a dataset with a specified number of samples (n_samples
), features (n_features
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
.Instantiate a
GradientBoostingRegressor
model with 100 estimators, a learning rate of 0.1, and a maximum depth of 3. The model is then fit on the training data using thefit()
method.Evaluate the model’s performance by comparing the predictions (
yhat
) to the actual values (y_test
) using the mean squared error metric.Make a single prediction by passing a new data sample to the
predict()
method.
This example demonstrates how to set up and use a GradientBoostingRegressor
model for regression tasks, highlighting the efficiency and accuracy of this algorithm in scikit-learn. The model can be fit directly on the training data and used to make predictions on new data, making it a practical choice for real-world regression problems.