SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "max_depth" Parameter

The max_depth parameter in scikit-learn’s RandomForestClassifier controls the maximum depth of the decision trees in the ensemble.

Random Forest is an ensemble learning method that combines predictions from multiple decision trees to improve generalization performance. The max_depth parameter limits the maximum depth of each tree in the forest.

Limiting the depth of the trees can help prevent overfitting by reducing the complexity of the model. Shallower trees tend to have higher bias but lower variance.

The default value for max_depth is None, which allows the trees to grow until all leaves are pure or contain less than min_samples_split samples.

In practice, common values for max_depth range from 5 to 20, depending on the size and complexity of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=20,
                           n_informative=10, n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_depth values
max_depth_values = [None, 5, 10, 20]
accuracies = []

for depth in max_depth_values:
    rf = RandomForestClassifier(max_depth=depth, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"max_depth={depth}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

max_depth=None, Accuracy: 0.830
max_depth=5, Accuracy: 0.785
max_depth=10, Accuracy: 0.835
max_depth=20, Accuracy: 0.830

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train RandomForestClassifier models with different max_depth values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting max_depth:

Issues to consider:



See Also