SKLearner Home | About | Contact | Examples

Configure RandomForestRegressor "max_depth" Parameter

The max_depth parameter in scikit-learn’s RandomForestRegressor controls the maximum depth of each decision tree in the forest.

Random Forest is an ensemble learning method that combines predictions from multiple decision trees to improve generalization performance. The max_depth parameter limits how deep each tree can grow during training.

Smaller values of max_depth create shallower trees that are less complex and can help prevent overfitting. Larger values allow for deeper, more complex trees that can capture intricate patterns in the data but may overfit.

The default value for max_depth is None, which allows the trees to grow until all leaves contain only one sample or all samples at a leaf have the same target value.

In practice, common values for max_depth range from 3 to 10, depending on the size and complexity of the dataset.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_depth values
max_depth_values = [None, 3, 5, 10]
mse_scores = []

for depth in max_depth_values:
    rf = RandomForestRegressor(max_depth=depth, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"max_depth={depth}, MSE: {mse:.3f}")

Running the example gives an output like:

max_depth=None, MSE: 2621.793
max_depth=3, MSE: 6906.733
max_depth=5, MSE: 4349.279
max_depth=10, MSE: 2692.969

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative features and noise
  2. Split the data into train and test sets
  3. Train RandomForestRegressor models with different max_depth values
  4. Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting max_depth:

Issues to consider:



See Also