SKLearner Home | About | Contact | Examples

Configure DecisionTreeClassifier "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s DecisionTreeClassifier controls the minimum weighted fraction of the total sum of weights (of all the input samples) required to be at a leaf node.

By default, this parameter is set to 0.0, which means that no restriction is imposed on the leaf node size. Increasing this value can be used to control overfitting and the complexity of the decision tree.

Common values for min_weight_fraction_leaf range from 0.0 to 0.5, depending on the characteristics of the dataset, such as class imbalance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.8, 0.2],
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0.0, 0.1, 0.2, 0.3]
accuracies = []
tree_depths = []

for value in min_weight_fraction_leaf_values:
    dt = DecisionTreeClassifier(min_weight_fraction_leaf=value, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    tree_depths.append(dt.get_depth())
    print(f"min_weight_fraction_leaf={value}, Accuracy: {accuracy:.3f}, Tree Depth: {dt.get_depth()}")

The example code output would look like:

min_weight_fraction_leaf=0.0, Accuracy: 0.870, Tree Depth: 10
min_weight_fraction_leaf=0.1, Accuracy: 0.870, Tree Depth: 5
min_weight_fraction_leaf=0.2, Accuracy: 0.870, Tree Depth: 3
min_weight_fraction_leaf=0.3, Accuracy: 0.840, Tree Depth: 2

The key steps in this example are:

  1. Generate an imbalanced synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train DecisionTreeClassifier models with different min_weight_fraction_leaf values
  4. Evaluate the accuracy and tree depth of each model on the test set

Some tips and heuristics for setting min_weight_fraction_leaf:

Issues to consider:



See Also