SKLearner Home | About | Contact | Examples

Configure Lasso "warm_start" Parameter

The warm_start parameter in scikit-learn’s Lasso class allows reusing the solution from the previous call to fit() as initialization for the next fit() operation. This can speed up convergence when fitting incrementally on very large datasets.

Lasso (Least Absolute Shrinkage and Selection Operator) is a linear regression algorithm that performs L1 regularization, which adds a penalty term to the loss function to encourage sparse solutions (i.e., many coefficients set to zero). This makes Lasso useful for feature selection and creating interpretable models.

The warm_start parameter is a boolean value that defaults to False. When set to True, it allows the model to be fitted incrementally on new data, continuing from the solution of the previous fit() call. This is particularly useful when working with datasets that are too large to fit in memory.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score
import numpy as np

# Generate large synthetic dataset
X, y = make_regression(n_samples=100000, n_features=1000, noise=0.1, random_state=42)


# Split into initial train set and additional batch
X_train, X_new, y_train, y_new = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with warm_start=False
lr = Lasso(warm_start=False, random_state=42)
lr.fit(X_train, y_train)
y_pred_false = lr.predict(X_new)
r2_false = r2_score(y_new, y_pred_false)
print(f"R2 with warm_start=False: {r2_false:.3f}")

# Train with warm_start=True
X_combined = np.concatenate((X_train, X_new))
y_combined = np.concatenate((y_train, y_new))

lr.set_params(warm_start=True)
lr.fit(X_combined, y_combined)
y_pred_true = lr.predict(X_new)
r2_true = r2_score(y_new, y_pred_true)
print(f"R2 with warm_start=True: {r2_true:.3f}")

Running the example gives an output like:

R2 with warm_start=False: 1.000
R2 with warm_start=True: 1.000

The code above:

  1. Generates a large synthetic regression dataset with 100,000 samples and 1,000 features
  2. Splits the data into train and test sets
  3. Fits a Lasso model with warm_start=False on the full training set and times it
  4. Adds more data and updates the model, by using the previous model as a starting point by setting warm_start=True

Some tips and heuristics for using warm_start:

Issues to consider:



See Also