Configure ExtraTreesClassifier "class_weight" Parameter

The class_weight parameter in scikit-learn’s ExtraTreesClassifier adjusts the importance of each class during training, particularly useful for imbalanced datasets.

ExtraTreesClassifier is an ensemble method that builds multiple decision trees using random subsets of features and samples. It often provides better generalization than a single decision tree.

The class_weight parameter assigns weights to classes, influencing the model’s decision boundary. Higher weights make misclassifications of the corresponding class more costly, potentially improving performance on imbalanced datasets.

By default, class_weight is set to None, treating all classes equally. Common options include 'balanced' (adjusts weights inversely proportional to class frequencies) or a dictionary specifying custom weights.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.metrics import f1_score

# Generate imbalanced synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=20, n_informative=3, n_redundant=2,
                           n_clusters_per_class=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different class_weight values
class_weights = [None, 'balanced', {0:1, 1:10}]
f1_scores = []

for weight in class_weights:
    clf = ExtraTreesClassifier(n_estimators=100, class_weight=weight, random_state=42)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    f1_scores.append(f1)
    print(f"class_weight={weight}, F1-score: {f1:.3f}")

Running the example gives an output like:

class_weight=None, F1-score: 0.769
class_weight=balanced, F1-score: 0.769
class_weight={0: 1, 1: 10}, F1-score: 0.769

The key steps in this example are:

Generate an imbalanced synthetic binary classification dataset
Split the data into train and test sets
Train ExtraTreesClassifier models with different class_weight values
Evaluate the F1-score of each model on the test set

Tips and heuristics for setting class_weight:

Use ‘balanced’ for a quick, automatic adjustment for imbalanced datasets
For severe imbalances, consider custom weights with higher values for minority classes
Monitor overall performance metrics, not just accuracy, when adjusting weights

Issues to consider:

Overweighting minority classes can lead to overfitting
The optimal weights depend on the specific dataset and problem
Class weights affect prediction probabilities, which may need recalibration

See Also