SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "max_features" Parameter

The max_features parameter in scikit-learn’s RandomForestClassifier controls the number of features considered when looking for the best split at each node in the decision trees.

Random Forest builds an ensemble of decision trees, and at each split in a tree, it considers a random subset of features. The max_features parameter determines the size of this subset.

Adjusting max_features affects the bias-variance tradeoff of the model. Using a smaller value introduces more randomness and reduces variance, but may increase bias if set too low.

The default value for max_features is "sqrt", which considers a number of features equal to the square root of the total number of features. Another common value is "log2", which uses the logarithm base 2 of the total features.

In practice, the optimal value depends on the specific dataset and problem. It is common to tune max_features using values between 0.1 and 1.0 times the total number of features.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset with many features
X, y = make_classification(n_samples=1000, n_features=100, n_informative=20,
                           n_redundant=0, n_classes=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_features values
max_features_values = [0.1, "sqrt", "log2", 0.5, 1.0]
accuracies = []

for mf in max_features_values:
    rf = RandomForestClassifier(max_features=mf, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"max_features={mf}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

max_features=0.1, Accuracy: 0.535
max_features=sqrt, Accuracy: 0.535
max_features=log2, Accuracy: 0.495
max_features=0.5, Accuracy: 0.510
max_features=1.0, Accuracy: 0.510

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset with 100 features
  2. Split the data into train and test sets
  3. Train RandomForestClassifier models with different max_features values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting max_features:

Issues to consider:



See Also