Configure LogisticRegression "warm_start" Parameter

The warm_start parameter in LogisticRegression allows fitting a model where the previous solution is used as initialization.

LogisticRegression is a linear model for binary classification that predicts the probability of an instance belonging to a class.

The warm_start parameter can be used to continue training an existing model, which is useful for adding more data or iterating on hyperparameters without starting from scratch.

The default value for warm_start is False. When set to True, it allows the reuse of previously computed solutions to speed up training and converge faster.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)


# Split into initial train set and additional batch
X_train, X_new, y_train, y_new = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with warm_start=False
lr = LogisticRegression(warm_start=False, random_state=42)
lr.fit(X_train, y_train)
y_pred_false = lr.predict(X_new)
accuracy_false = accuracy_score(y_new, y_pred_false)
print(f"Accuracy with warm_start=False: {accuracy_false:.3f}")

# Train with warm_start=True
X_combined = np.concatenate((X_train, X_new))
y_combined = np.concatenate((y_train, y_new))

lr.set_params(warm_start=True)
lr.fit(X_combined, y_combined)
y_pred_true = lr.predict(X_new)
accuracy_true = accuracy_score(y_new, y_pred_true)
print(f"Accuracy with warm_start=True: {accuracy_true:.3f}")

Running the example gives an output like:

Accuracy with warm_start=False: 0.825
Accuracy with warm_start=True: 0.830

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and noise features
Split the data into train and test sets
Train LogisticRegression models with warm_start=True and warm_start=False
Evaluate the accuracy of each model.

Some tips and heuristics for setting warm_start:

Use warm_start=True when adding more data to an existing model to avoid retraining from scratch
Helps in hyperparameter tuning by iterating on top of the existing solution, reducing computation time
Be cautious of memory usage, as the model will retain more data with each iteration

Issues to consider:

May not significantly improve performance for small datasets
Ensure the previous state of the model is appropriate for the current iteration to avoid poor performance
Requires careful management of training data to prevent overfitting

See Also