SKLearner Home | About | Contact | Examples

Configure RandomForestClassifier "class_weight" Parameter

The class_weight parameter in scikit-learn’s RandomForestClassifier is used to handle imbalanced datasets where one class has significantly fewer samples than the other.

Random Forest is an ensemble learning method that combines multiple decision trees to improve classification performance. However, when trained on imbalanced data, it can be biased towards the majority class.

The class_weight parameter allows assigning higher weights to the minority class during training, effectively increasing its importance. This can help the model better learn the patterns of the underrepresented class.

By default, class_weight is set to None, meaning all classes have equal weight. Common values are 'balanced', which automatically adjusts weights inversely proportional to class frequencies, or a dictionary specifying the weight for each class.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=10, n_informative=5, n_redundant=0,
                           random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define class weight options
class_weights = [None, 'balanced', {0: 1, 1: 10}]

for weight in class_weights:
    rf = RandomForestClassifier(n_estimators=100, class_weight=weight, random_state=42)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print(f"class_weight={weight}")
    print(f"Accuracy: {accuracy:.3f}, F1 Score: {f1:.3f}\n")

The output will be similar to:

class_weight=None
Accuracy: 0.960, F1 Score: 0.800

class_weight=balanced
Accuracy: 0.935, F1 Score: 0.629

class_weight={0: 1, 1: 10}
Accuracy: 0.935, F1 Score: 0.629

The key steps in this example are:

  1. Generate an imbalanced binary classification dataset
  2. Split the data into train and test sets
  3. Train RandomForestClassifier models with different class_weight values
  4. Evaluate the accuracy and F1 score of each model on the test set

Some tips and heuristics for setting class_weight:

Issues to consider:



See Also