Configure DecisionTreeRegressor "min_weight_fraction_leaf" Parameter

The min_weight_fraction_leaf parameter in scikit-learn’s DecisionTreeRegressor controls the minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

Decision Tree Regressor is a non-parametric supervised learning method used for regression tasks. It learns simple decision rules inferred from the data features to predict a target variable.

The min_weight_fraction_leaf parameter determines the minimum weighted fraction of the total sum of weights (of all the input samples) required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_weight_fraction_leaf fraction of the samples on each of the left and right branches.

The default value for min_weight_fraction_leaf is 0, meaning there is no restriction on the minimum weighted fraction of samples at each leaf node.

In practice, values between 0 and 0.5 are commonly used depending on the problem and the dataset characteristics.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0, 0.05, 0.1, 0.2]
mse_scores = []

for value in min_weight_fraction_leaf_values:
    dt = DecisionTreeRegressor(min_weight_fraction_leaf=value, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"min_weight_fraction_leaf={value}, MSE: {mse:.3f}")

Running the example gives an output like:

min_weight_fraction_leaf=0, MSE: 6350.428
min_weight_fraction_leaf=0.05, MSE: 8002.303
min_weight_fraction_leaf=0.1, MSE: 8789.410
min_weight_fraction_leaf=0.2, MSE: 11512.466

The key steps in this example are:

Generate a synthetic regression dataset with noise
Split the data into train and test sets
Train DecisionTreeRegressor models with different min_weight_fraction_leaf values
Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting min_weight_fraction_leaf:

Start with the default value of 0 and increase it to add more regularization
Higher values can help prevent overfitting, especially on smaller datasets
Consider the trade-off between model complexity and generalization performance

Issues to consider:

Setting the value too high can lead to underfitting
Very low values may cause the tree to overfit the training data
The optimal value depends on the specific dataset and problem

See Also