Configure GradientBoostingClassifier "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s GradientBoostingClassifier controls the minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

Gradient Boosting is an ensemble learning method that sequentially adds decision trees to correct the errors made by the previous trees. Each tree is trained on the residual errors of the previous trees.

The min_weight_fraction_leaf parameter determines the minimum fraction of the total weight of the input samples required to be at a leaf node. It can be used to control the complexity of the individual trees.

The default value for min_weight_fraction_leaf is 0, which means there is no restriction on the minimum weighted fraction of samples at each leaf node.

In practice, values between 0 and 0.5 are commonly used depending on the size and complexity of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0.0, 0.1, 0.3, 0.5]
accuracies = []

for val in min_weight_fraction_leaf_values:
    gb = GradientBoostingClassifier(min_weight_fraction_leaf=val, random_state=42)
    gb.fit(X_train, y_train)
    y_pred = gb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"min_weight_fraction_leaf={val}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

min_weight_fraction_leaf=0.0, Accuracy: 0.900
min_weight_fraction_leaf=0.1, Accuracy: 0.910
min_weight_fraction_leaf=0.3, Accuracy: 0.840
min_weight_fraction_leaf=0.5, Accuracy: 0.705

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and noise features
Split the data into train and test sets
Train GradientBoostingClassifier models with different min_weight_fraction_leaf values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting min_weight_fraction_leaf:

Start with the default value of 0 and increase it to control the complexity of the trees
Higher values can help prevent overfitting by forcing a larger fraction of samples at leaf nodes
Consider the size of the dataset when setting this parameter

Issues to consider:

Setting the value too high can lead to underfitting as the trees become too simple
The optimal value depends on the specific problem and dataset
This parameter interacts with other parameters that control tree complexity, like max_depth

See Also