SKLearner Home | About | Contact | Examples

Configure GradientBoostingClassifier "min_impurity_decrease" Parameter

The min_impurity_decrease parameter in scikit-learn’s GradientBoostingClassifier controls the minimum decrease in impurity required to split an internal node.

Gradient Boosting is an ensemble learning method that sequentially adds decision trees to correct the errors made by the previous trees. The min_impurity_decrease parameter determines the minimum reduction in impurity required to make a split.

Setting a higher value for min_impurity_decrease will result in a more conservative model that only splits nodes when there is a significant decrease in impurity. This can help to prevent overfitting.

The default value for min_impurity_decrease is 0.0, meaning that even a very small decrease in impurity will cause a split.

In practice, values between 0.0 and 0.5 are commonly used depending on the noise level and complexity of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_impurity_decrease values
min_impurity_decrease_values = [0.0, 0.1, 0.3, 0.5]
accuracies = []

for min_impurity_decrease in min_impurity_decrease_values:
    gbc = GradientBoostingClassifier(min_impurity_decrease=min_impurity_decrease, random_state=42)
    gbc.fit(X_train, y_train)
    y_pred = gbc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"min_impurity_decrease={min_impurity_decrease}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

min_impurity_decrease=0.0, Accuracy: 0.900
min_impurity_decrease=0.1, Accuracy: 0.905
min_impurity_decrease=0.3, Accuracy: 0.895
min_impurity_decrease=0.5, Accuracy: 0.905

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train GradientBoostingClassifier models with different min_impurity_decrease values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting min_impurity_decrease:

Issues to consider:



See Also