SKLearner Home | About | Contact | Examples

Configure SGDClassifier "class_weight" Parameter

The class_weight parameter in scikit-learn’s SGDClassifier adjusts the importance of classes during training, which is particularly useful for imbalanced datasets.

SGDClassifier (Stochastic Gradient Descent Classifier) is a linear classifier that uses stochastic gradient descent for optimization. It’s efficient for large-scale learning and supports different loss functions.

The class_weight parameter modifies the update step for each class, effectively giving more importance to samples from the minority class. This helps prevent the classifier from being biased towards the majority class.

By default, class_weight is set to None, treating all classes equally. Common options include ‘balanced’ (automatically adjusts weights inversely proportional to class frequencies) and a custom dictionary specifying weights for each class.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import f1_score

# Generate imbalanced dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1],
                           n_features=20, n_informative=3, n_redundant=0,
                           random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different class_weight settings
class_weights = [None, 'balanced', {0:1, 1:9}]
for weight in class_weights:
    clf = SGDClassifier(class_weight=weight, random_state=42)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    f1 = f1_score(y_test, y_pred)
    print(f"class_weight={weight}, F1-score: {f1:.3f}")

Running the example gives an output like:

class_weight=None, F1-score: 0.703
class_weight=balanced, F1-score: 0.516
class_weight={0: 1, 1: 9}, F1-score: 0.510

The key steps in this example are:

  1. Generate a synthetic imbalanced binary classification dataset
  2. Split the data into train and test sets
  3. Train SGDClassifier models with different class_weight settings
  4. Evaluate each model’s performance using F1-score

Tips and heuristics for setting class_weight:

Issues to consider:



See Also