Configure RandomForestClassifier "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s RandomForestClassifier sets the minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

This parameter affects the growth of decision trees in the ensemble by controlling the creation of leaf nodes based on the proportion of sample weights they contain. A higher value limits the creation of leaf nodes to those with a significant fraction of the total sample weight, resulting in smaller, less complex trees.

The default value for min_weight_fraction_leaf is 0, meaning there is no restriction on the weighted fraction of samples at a leaf node. In practice, values between 0 and 0.5 are commonly used, depending on the dataset’s characteristics and the desired balance between model complexity and generalization performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=10, n_informative=5, n_redundant=0,
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0.0, 0.1, 0.2, 0.3]
f1_scores = []

for value in min_weight_fraction_leaf_values:
    rf = RandomForestClassifier(min_weight_fraction_leaf=value, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    f1_scores.append(f1)
    print(f"min_weight_fraction_leaf={value}, F1 Score: {f1:.3f}")

Running the example gives an output like:

min_weight_fraction_leaf=0.0, F1 Score: 0.800
min_weight_fraction_leaf=0.1, F1 Score: 0.000
min_weight_fraction_leaf=0.2, F1 Score: 0.000
min_weight_fraction_leaf=0.3, F1 Score: 0.000

The key steps in this example are:

Generate an imbalanced synthetic binary classification dataset
Split the data into train and test sets
Train RandomForestClassifier models with different min_weight_fraction_leaf values
Evaluate the F1 score of each model on the test set

Tips and heuristics for setting min_weight_fraction_leaf:

Use higher values (e.g., 0.1 to 0.5) for imbalanced datasets to prevent overfitting on the minority class
Consider the trade-off between model complexity and generalization performance when tuning this parameter

Issues to consider:

Higher values lead to smaller, less complex decision trees, which can help prevent overfitting but may also cause underfitting if set too high
The optimal value depends on the dataset’s characteristics, such as class imbalance and the presence of noisy or irrelevant features

See Also