SKLearner Home | About | Contact | Examples

Configure RandomForestRegressor "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s RandomForestRegressor controls the minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

This parameter helps to control the model’s complexity and can be used to mitigate overfitting. Higher values impose a stronger regularization, creating smaller, more constrained leaves.

The default value for min_weight_fraction_leaf is 0, meaning there is no minimum weighted fraction requirement by default.

In practice, values between 0 and 0.5 are commonly used depending on the dataset’s characteristics and the desired level of regularization.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
                       noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0, 0.1, 0.25, 0.5]
mse_scores = []

for value in min_weight_fraction_leaf_values:
    rf = RandomForestRegressor(min_weight_fraction_leaf=value, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"min_weight_fraction_leaf={value}, MSE: {mse:.3f}")

Running the example gives an output like:

min_weight_fraction_leaf=0, MSE: 208.093
min_weight_fraction_leaf=0.1, MSE: 837.138
min_weight_fraction_leaf=0.25, MSE: 1735.592
min_weight_fraction_leaf=0.5, MSE: 2169.169

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train RandomForestRegressor models with different min_weight_fraction_leaf values
  4. Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting min_weight_fraction_leaf:

Issues to consider:



See Also