The min_weight_fraction_leaf
parameter in scikit-learn’s DecisionTreeRegressor
controls the minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
Decision Tree Regressor is a non-parametric supervised learning method used for regression tasks. It learns simple decision rules inferred from the data features to predict a target variable.
The min_weight_fraction_leaf
parameter determines the minimum weighted fraction of the total sum of weights (of all the input samples) required to be at a leaf node. A split point at any depth will only be considered if it leaves at least min_weight_fraction_leaf
fraction of the samples on each of the left and right branches.
The default value for min_weight_fraction_leaf
is 0, meaning there is no restriction on the minimum weighted fraction of samples at each leaf node.
In practice, values between 0 and 0.5 are commonly used depending on the problem and the dataset characteristics.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different min_weight_fraction_leaf values
min_weight_fraction_leaf_values = [0, 0.05, 0.1, 0.2]
mse_scores = []
for value in min_weight_fraction_leaf_values:
dt = DecisionTreeRegressor(min_weight_fraction_leaf=value, random_state=42)
dt.fit(X_train, y_train)
y_pred = dt.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"min_weight_fraction_leaf={value}, MSE: {mse:.3f}")
Running the example gives an output like:
min_weight_fraction_leaf=0, MSE: 6350.428
min_weight_fraction_leaf=0.05, MSE: 8002.303
min_weight_fraction_leaf=0.1, MSE: 8789.410
min_weight_fraction_leaf=0.2, MSE: 11512.466
The key steps in this example are:
- Generate a synthetic regression dataset with noise
- Split the data into train and test sets
- Train
DecisionTreeRegressor
models with differentmin_weight_fraction_leaf
values - Evaluate the mean squared error (MSE) of each model on the test set
Some tips and heuristics for setting min_weight_fraction_leaf
:
- Start with the default value of 0 and increase it to add more regularization
- Higher values can help prevent overfitting, especially on smaller datasets
- Consider the trade-off between model complexity and generalization performance
Issues to consider:
- Setting the value too high can lead to underfitting
- Very low values may cause the tree to overfit the training data
- The optimal value depends on the specific dataset and problem