Configure ExtraTreesRegressor "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s ExtraTreesRegressor controls the minimum weighted fraction of the sum total of weights required to be at a leaf node.

The Extra Trees algorithm, short for Extremely Randomized Trees, is an ensemble method that builds multiple decision trees with increased randomization. It differs from Random Forests in how it selects split points and uses the entire learning sample to grow trees.

The min_weight_fraction_leaf parameter sets a threshold for the minimum fraction of samples required at a leaf node. This affects the depth and complexity of the trees in the ensemble.

The default value for min_weight_fraction_leaf is 0.0, which means no minimum fraction is imposed. In practice, values between 0.0 and 0.5 are commonly used, with smaller values allowing for more complex trees.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
leaf_fractions = [0.0, 0.1, 0.2, 0.3]
mse_scores = []

for fraction in leaf_fractions:
    etr = ExtraTreesRegressor(n_estimators=100, min_weight_fraction_leaf=fraction, random_state=42)
    etr.fit(X_train, y_train)
    y_pred = etr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"min_weight_fraction_leaf={fraction}, MSE: {mse:.3f}")

Running the example gives an output like:

min_weight_fraction_leaf=0.0, MSE: 2036.183
min_weight_fraction_leaf=0.1, MSE: 8603.181
min_weight_fraction_leaf=0.2, MSE: 12409.299
min_weight_fraction_leaf=0.3, MSE: 14671.269

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train ExtraTreesRegressor models with different min_weight_fraction_leaf values
Evaluate the Mean Squared Error (MSE) of each model on the test set

Some tips and heuristics for setting min_weight_fraction_leaf:

Start with the default value of 0.0 and increase it to control overfitting
Higher values lead to shallower trees, which can reduce overfitting but may increase bias
Consider the size of your dataset; smaller datasets may benefit from higher values

Issues to consider:

The optimal value depends on the noise level and complexity of your data
Very high values can lead to underfitting, especially on complex datasets
This parameter interacts with other tree-specific parameters like max_depth
Balancing between bias and variance is key when tuning this parameter

See Also