Configure DecisionTreeClassifier "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s DecisionTreeClassifier controls the minimum weighted fraction of the total sum of weights (of all the input samples) required to be at a leaf node.

By default, this parameter is set to 0.0, which means that no restriction is imposed on the leaf node size. Increasing this value can be used to control overfitting and the complexity of the decision tree.

Common values for min_weight_fraction_leaf range from 0.0 to 0.5, depending on the characteristics of the dataset, such as class imbalance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.8, 0.2],
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0.0, 0.1, 0.2, 0.3]
accuracies = []
tree_depths = []

for value in min_weight_fraction_leaf_values:
    dt = DecisionTreeClassifier(min_weight_fraction_leaf=value, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    tree_depths.append(dt.get_depth())
    print(f"min_weight_fraction_leaf={value}, Accuracy: {accuracy:.3f}, Tree Depth: {dt.get_depth()}")

The example code output would look like:

min_weight_fraction_leaf=0.0, Accuracy: 0.870, Tree Depth: 10
min_weight_fraction_leaf=0.1, Accuracy: 0.870, Tree Depth: 5
min_weight_fraction_leaf=0.2, Accuracy: 0.870, Tree Depth: 3
min_weight_fraction_leaf=0.3, Accuracy: 0.840, Tree Depth: 2

The key steps in this example are:

Generate an imbalanced synthetic binary classification dataset
Split the data into train and test sets
Train DecisionTreeClassifier models with different min_weight_fraction_leaf values
Evaluate the accuracy and tree depth of each model on the test set

Some tips and heuristics for setting min_weight_fraction_leaf:

Higher values limit the tree depth and can help prevent overfitting
Setting the value too high may lead to underfitting
The value is often set based on the class imbalance ratio in the dataset

Issues to consider:

The effect of min_weight_fraction_leaf depends on the class imbalance and size of the dataset
It may be necessary to tune this parameter in conjunction with other parameters like max_depth
The optimal value is dataset-specific and requires experimentation to determine

See Also