SKLearner Home | About | Contact | Examples

Configure DecisionTreeRegressor "ccp_alpha" Parameter

The ccp_alpha parameter in scikit-learn’s DecisionTreeRegressor controls the complexity of the decision tree via cost complexity pruning. It determines the amount of pruning to apply during tree construction.

Higher values of ccp_alpha lead to more pruning, resulting in simpler trees with fewer nodes. This can help prevent overfitting by removing branches that provide little predictive value.

The default value for ccp_alpha is 0.0, which means no pruning is performed. The tree will grow to its maximum depth, potentially capturing noise in the training data.

In practice, ccp_alpha values typically range from 0.0 to 0.1 or higher, depending on the desired simplicity of the model. The optimal value depends on the specific dataset and problem.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different ccp_alpha values
ccp_alpha_values = [0.0, 0.01, 0.05, 0.1]
mse_scores = []

for ccp_alpha in ccp_alpha_values:
    dt = DecisionTreeRegressor(ccp_alpha=ccp_alpha, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"ccp_alpha={ccp_alpha}, MSE: {mse:.3f}")

Running the example gives an output like:

ccp_alpha=0.0, MSE: 6350.428
ccp_alpha=0.01, MSE: 6364.784
ccp_alpha=0.05, MSE: 6370.810
ccp_alpha=0.1, MSE: 6383.007

The key steps in this example are:

  1. Generate a synthetic regression dataset with noise
  2. Split the data into train and test sets
  3. Train DecisionTreeRegressor models with different ccp_alpha values
  4. Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting ccp_alpha:

Issues to consider:



See Also