Configure Lasso "random_state" Parameter

The random_state parameter in scikit-learn’s Lasso class is used to control the reproducibility of results by setting the seed for the random number generator.

Lasso regression uses the coordinate descent algorithm as its solver, which has some inherent randomness. By default, the random_state parameter is set to None, meaning that the results may vary slightly each time the model is fit, even with the same data and parameters.

Setting random_state to an integer value ensures that the results will be identical across multiple runs with the same data and parameters. This is useful for sharing research code, validation, and collaboration.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=10, n_informative=5,
                       n_targets=1, noise=0.5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit Lasso with different random_state values
lasso_none = Lasso(alpha=0.1, random_state=None)
lasso_42 = Lasso(alpha=0.1, random_state=42)

lasso_none.fit(X_train, y_train)
lasso_42.fit(X_train, y_train)

# Evaluate models
y_pred_none = lasso_none.predict(X_test)
y_pred_42 = lasso_42.predict(X_test)

mse_none = mean_squared_error(y_test, y_pred_none)
mse_42 = mean_squared_error(y_test, y_pred_42)

print(f"Lasso with random_state=None, MSE: {mse_none:.3f}")
print(f"Lasso with random_state=42, MSE: {mse_42:.3f}")
print(f"Coefficients with random_state=None: {lasso_none.coef_}")
print(f"Coefficients with random_state=42: {lasso_42.coef_}")

The output will look similar to:

Lasso with random_state=None, MSE: 0.492
Lasso with random_state=42, MSE: 0.492
Coefficients with random_state=None: [16.63936001  0.          0.         63.48456757  0.         70.48936138
 -0.         10.30161917  3.07260898 -0.        ]
Coefficients with random_state=42: [16.63936001  0.          0.         63.48456757  0.         70.48936138
 -0.         10.30161917  3.07260898 -0.        ]

The key steps in this example are:

Generate a synthetic regression dataset with informative features and noise
Split the data into train and test sets
Fit Lasso models with different random_state values (None and 42)
Evaluate the mean squared error of each model on the test set and compare coefficients

Some tips and heuristics for setting random_state:

Always set random_state to an integer for reproducibility
Use different integer values for independent, repeatable runs

Issues to consider:

The random_state parameter only matters for algorithms with inherent randomness, like those using coordinate descent
Not all scikit-learn estimators utilize the random_state parameter

See Also