The warm_start
parameter in LogisticRegression
allows fitting a model where the previous solution is used as initialization.
LogisticRegression
is a linear model for binary classification that predicts the probability of an instance belonging to a class.
The warm_start
parameter can be used to continue training an existing model, which is useful for adding more data or iterating on hyperparameters without starting from scratch.
The default value for warm_start
is False
. When set to True
, it allows the reuse of previously computed solutions to speed up training and converge faster.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=42)
# Split into initial train set and additional batch
X_train, X_new, y_train, y_new = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with warm_start=False
lr = LogisticRegression(warm_start=False, random_state=42)
lr.fit(X_train, y_train)
y_pred_false = lr.predict(X_new)
accuracy_false = accuracy_score(y_new, y_pred_false)
print(f"Accuracy with warm_start=False: {accuracy_false:.3f}")
# Train with warm_start=True
X_combined = np.concatenate((X_train, X_new))
y_combined = np.concatenate((y_train, y_new))
lr.set_params(warm_start=True)
lr.fit(X_combined, y_combined)
y_pred_true = lr.predict(X_new)
accuracy_true = accuracy_score(y_new, y_pred_true)
print(f"Accuracy with warm_start=True: {accuracy_true:.3f}")
Running the example gives an output like:
Accuracy with warm_start=False: 0.825
Accuracy with warm_start=True: 0.830
The key steps in this example are:
- Generate a synthetic binary classification dataset with informative and noise features
- Split the data into train and test sets
- Train
LogisticRegression
models withwarm_start=True
andwarm_start=False
- Evaluate the accuracy of each model.
Some tips and heuristics for setting warm_start
:
- Use
warm_start=True
when adding more data to an existing model to avoid retraining from scratch - Helps in hyperparameter tuning by iterating on top of the existing solution, reducing computation time
- Be cautious of memory usage, as the model will retain more data with each iteration
Issues to consider:
- May not significantly improve performance for small datasets
- Ensure the previous state of the model is appropriate for the current iteration to avoid poor performance
- Requires careful management of training data to prevent overfitting