SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "ccp_alpha" Parameter

The ccp_alpha parameter in scikit-learn’s GradientBoostingRegressor controls the complexity of the individual decision trees by pruning them.

Gradient Boosting is an ensemble technique that builds models sequentially to correct errors made by previous models. It reduces bias by fitting new models to the residual errors of prior models.

The ccp_alpha parameter controls the complexity of the decision trees by pruning them. A larger value of ccp_alpha results in more aggressive pruning, leading to simpler models.

The default value for ccp_alpha is 0.0, which means no pruning is applied. Commonly used values range from 0.0 to 0.1, depending on the desired balance between bias and variance.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different ccp_alpha values
ccp_alpha_values = [0.0, 0.01, 0.05, 0.1]
mse_scores = []

for alpha in ccp_alpha_values:
    gbr = GradientBoostingRegressor(ccp_alpha=alpha, random_state=42)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"ccp_alpha={alpha}, MSE: {mse:.3f}")

Running the example gives an output like:

ccp_alpha=0.0, MSE: 1234.753
ccp_alpha=0.01, MSE: 1234.753
ccp_alpha=0.05, MSE: 1234.832
ccp_alpha=0.1, MSE: 1226.985

The key steps in this example are:

  1. Generate a synthetic regression dataset with noise.
  2. Split the data into train and test sets.
  3. Train GradientBoostingRegressor models with different ccp_alpha values.
  4. Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting ccp_alpha:

Issues to consider:



See Also