SKLearner Home | About | Contact | Examples

Configure SGDClassifier "random_state" Parameter

The random_state parameter in scikit-learn’s SGDClassifier controls the randomness of the stochastic gradient descent algorithm during training.

Stochastic Gradient Descent (SGD) is an optimization algorithm that updates model parameters incrementally using subsets of the training data. The random_state parameter affects the shuffling of the training data and the initialization of the model’s weights.

Setting random_state to a fixed value ensures reproducibility of results across different runs. This is crucial for debugging, comparing models, and producing consistent predictions.

The default value for random_state is None, which uses the system’s random number generator. For reproducibility, it’s common to use integer values (e.g., 42, 0, 1000).

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 0, 42, 100]
accuracies = []

for rs in random_state_values:
    sgd = SGDClassifier(random_state=rs, max_iter=1000, tol=1e-3)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"random_state={rs}, Accuracy: {accuracy:.3f}")

# Train multiple times with random_state=None
print("\nMultiple runs with random_state=None:")
for _ in range(3):
    sgd = SGDClassifier(random_state=None, max_iter=1000, tol=1e-3)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.3f}")
random_state=None, Accuracy: 0.705
random_state=0, Accuracy: 0.800
random_state=42, Accuracy: 0.770
random_state=100, Accuracy: 0.770

Multiple runs with random_state=None:
Accuracy: 0.740
Accuracy: 0.755
Accuracy: 0.775

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train SGDClassifier models with different random_state values
  4. Evaluate the accuracy of each model on the test set
  5. Demonstrate the variability of results when random_state is None

Tips for setting random_state:

Issues to consider:



See Also