Configure Lasso "tol" Parameter

The tol parameter in scikit-learn’s Lasso class controls the tolerance for the optimization algorithm.

Lasso, or Least Absolute Shrinkage and Selection Operator, is a linear regression technique that performs both variable selection and regularization. It adds a penalty term to the loss function, encouraging sparse solutions.

The tol parameter sets the tolerance for the optimization algorithm. The optimization will stop when the difference in the loss function between iterations is less than or equal to tol.

The default value for tol is 0.0001.

In practice, values between 0.0001 and 0.01 are commonly used depending on the desired precision and computational resources available.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
                       n_targets=1, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [0.1, 0.01, 0.001, 0.0001]
r2_scores = []

for tol in tol_values:
    lasso = Lasso(tol=tol, random_state=42)
    lasso.fit(X_train, y_train)
    y_pred = lasso.predict(X_test)
    r2 = r2_score(y_test, y_pred)
    r2_scores.append(r2)
    print(f"tol={tol}, R-squared: {r2:.3f}")

Running the example gives an output like:

tol=0.1, R-squared: 0.999
tol=0.01, R-squared: 0.999
tol=0.001, R-squared: 0.999
tol=0.0001, R-squared: 0.999

The key steps in this example are:

Generate a synthetic regression dataset with informative and noise features
Split the data into train and test sets
Train Lasso models with different tol values
Evaluate the R-squared score of each model on the test set

Some tips and heuristics for setting tol:

The default value of 0.0001 works well in most cases
Smaller values lead to more precise solutions but longer training times
Larger values can speed up training but may result in less optimal solutions

Issues to consider:

Setting tol too small can lead to longer training times with diminishing returns in performance
Setting tol too large may result in suboptimal solutions
The impact of tol depends on the scale of the data and the specific problem

See Also