SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s RandomForestClassifier sets the minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

This parameter affects the growth of decision trees in the ensemble by controlling the creation of leaf nodes based on the proportion of sample weights they contain. A higher value limits the creation of leaf nodes to those with a significant fraction of the total sample weight, resulting in smaller, less complex trees.

The default value for min_weight_fraction_leaf is 0, meaning there is no restriction on the weighted fraction of samples at a leaf node. In practice, values between 0 and 0.5 are commonly used, depending on the dataset’s characteristics and the desired balance between model complexity and generalization performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import f1_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=10, n_informative=5, n_redundant=0,
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0.0, 0.1, 0.2, 0.3]
f1_scores = []

for value in min_weight_fraction_leaf_values:
    rf = RandomForestClassifier(min_weight_fraction_leaf=value, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    f1_scores.append(f1)
    print(f"min_weight_fraction_leaf={value}, F1 Score: {f1:.3f}")

Running the example gives an output like:

min_weight_fraction_leaf=0.0, F1 Score: 0.800
min_weight_fraction_leaf=0.1, F1 Score: 0.000
min_weight_fraction_leaf=0.2, F1 Score: 0.000
min_weight_fraction_leaf=0.3, F1 Score: 0.000

The key steps in this example are:

  1. Generate an imbalanced synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train RandomForestClassifier models with different min_weight_fraction_leaf values
  4. Evaluate the F1 score of each model on the test set

Tips and heuristics for setting min_weight_fraction_leaf:

Issues to consider:



See Also