SKLearner Home | About | Contact | Examples

Configure DecisionTreeClassifier "ccp_alpha" Parameter

The ccp_alpha parameter in scikit-learn’s DecisionTreeClassifier controls the complexity of the tree via minimal cost-complexity pruning.

Pruning reduces the size of the decision tree by removing branches that provide little power to classify instances. This helps to prevent overfitting and can improve the model’s generalization performance on unseen data.

The ccp_alpha parameter determines the complexity parameter α. The subtree with the largest cost complexity that is smaller than α will be pruned. By default, no pruning is performed.

The default value for ccp_alpha is 0.0, which means no pruning is done.

In practice, ccp_alpha values between 0.0 and 0.1 are common, depending on the complexity of the dataset and the desire to avoid overfitting.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different ccp_alpha values
ccp_alphas = [0.0, 0.01, 0.05, 0.1]
accuracies = []

for ccp_alpha in ccp_alphas:
    dt = DecisionTreeClassifier(random_state=42, ccp_alpha=ccp_alpha)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"ccp_alpha={ccp_alpha}, Accuracy: {accuracy:.3f}")

The output will look similar to:

ccp_alpha=0.0, Accuracy: 0.875
ccp_alpha=0.01, Accuracy: 0.855
ccp_alpha=0.05, Accuracy: 0.735
ccp_alpha=0.1, Accuracy: 0.735

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train DecisionTreeClassifier models with different ccp_alpha values
  4. Evaluate the accuracy of each model on the test set and print results

Tips and heuristics for setting ccp_alpha:

Issues to consider:



See Also