SKLearner Home | About | Contact | Examples

Configure RandomForestRegressor "min_impurity_decrease" Parameter

The min_impurity_decrease parameter in scikit-learn’s RandomForestRegressor controls the minimum decrease in impurity required to split an internal node in the decision trees.

Random Forest is an ensemble learning method that combines predictions from multiple decision trees to improve generalization performance. The min_impurity_decrease parameter determines the minimum reduction in impurity (e.g., MSE) required for a split to occur.

Higher values of min_impurity_decrease lead to simpler models, as they require a larger decrease in impurity for a split to be considered worthwhile. This can help prevent overfitting by limiting the complexity of the trees.

The default value for min_impurity_decrease is 0, which means that any decrease in impurity is sufficient to split a node.

In practice, values between 0 and 1 are commonly used, depending on the dataset and the desired balance between model complexity and generalization performance.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
                       n_targets=1, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different min_impurity_decrease values
min_impurity_decrease_values = [0, 0.01, 0.1, 0.5]
mse_scores = []

for min_impurity in min_impurity_decrease_values:
    rf = RandomForestRegressor(min_impurity_decrease=min_impurity, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"min_impurity_decrease={min_impurity}, MSE: {mse:.3f}")

Running the example gives an output like:

min_impurity_decrease=0, MSE: 208.093
min_impurity_decrease=0.01, MSE: 208.539
min_impurity_decrease=0.1, MSE: 208.988
min_impurity_decrease=0.5, MSE: 214.706

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train RandomForestRegressor models with different min_impurity_decrease values
  4. Evaluate the mean squared error (MSE) of each model on the test set

Some tips and heuristics for setting min_impurity_decrease:

Issues to consider:



See Also