SKLearner Home | About | Contact | Examples

Configure LogisticRegression "random_state" Parameter

The random_state parameter in scikit-learn’s LogisticRegression controls the reproducibility of the results by fixing the random number generation used for shuffling the data and initializing the model’s weights.

Logistic Regression is a linear model for binary classification that estimates the probability of a binary response based on one or more predictor variables. The random_state parameter ensures reproducibility by controlling the random number generation for shuffling the data and initializing the model’s weights.

The default value for random_state is None, meaning the randomness will be uncontrolled and different results may be obtained on different runs. Commonly used values are integers like 0, 42, etc., for reproducibility.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 0, 42]
accuracies = []

for rs in random_state_values:
    lr = LogisticRegression(random_state=rs, max_iter=10000)
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"random_state={rs}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

random_state=None, Accuracy: 0.770
random_state=0, Accuracy: 0.770
random_state=42, Accuracy: 0.770

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and noise features.
  2. Split the data into train and test sets.
  3. Train LogisticRegression models with different random_state values.
  4. Evaluate the accuracy of each model on the test set.

Some tips and heuristics for setting random_state:

Issues to consider:



See Also