SKLearner Home | About | Contact | Examples

Configure DecisionTreeRegressor "min_impurity_decrease" Parameter

The min_impurity_decrease parameter in scikit-learn’s DecisionTreeRegressor is a pruning parameter that controls the complexity of the decision tree. It sets the minimum decrease in impurity required to make a split at a node.

Decision Tree Regression is a non-parametric supervised learning method that learns decision rules inferred from the data features to predict a target variable. The min_impurity_decrease parameter influences the tree’s structure and complexity.

A higher value of min_impurity_decrease results in smaller, more pruned trees, as it requires a larger decrease in impurity for a split to occur. This can help prevent overfitting.

The default value for min_impurity_decrease is 0.0, meaning that any split that decreases the impurity is allowed.

In practice, values between 0.0 and 1.0 are commonly used depending on the dataset and the desired complexity of the model.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_impurity_decrease values
min_impurity_decrease_values = [0.0, 0.01, 0.1, 1.0]
mse_scores = []

for min_impurity_decrease in min_impurity_decrease_values:
    dt = DecisionTreeRegressor(min_impurity_decrease=min_impurity_decrease, random_state=42)
    dt.fit(X_train, y_train)
    y_pred = dt.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"min_impurity_decrease={min_impurity_decrease}, MSE: {mse:.3f}")

Running the example gives an output like:

min_impurity_decrease=0.0, MSE: 6350.428
min_impurity_decrease=0.01, MSE: 6818.518
min_impurity_decrease=0.1, MSE: 6433.889
min_impurity_decrease=1.0, MSE: 6455.289

The key steps in this example are:

  1. Generate a synthetic regression dataset with a non-linear relationship and some noise
  2. Split the data into train and test sets
  3. Train DecisionTreeRegressor models with different min_impurity_decrease values
  4. Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting min_impurity_decrease:

Issues to consider:



See Also