Scikit-Learn "Lars" versus "LarsCV"

Lars (Least Angle Regression) is a regression algorithm suitable for high-dimensional data.

In scikit-learn, the Lars class provides an implementation of this algorithm. Key hyperparameters include n_nonzero_coefs (number of non-zero coefficients) and fit_intercept (whether to calculate the intercept for this model).

LarsCV extends Lars by incorporating cross-validation to select the best hyperparameters automatically. Its key hyperparameters include cv (number of folds for cross-validation) and max_iter (maximum number of iterations).

The main difference is that Lars requires manual tuning of hyperparameters, while LarsCV automates this process using cross-validation. However, this automation comes at a computational cost, as LarsCV trains multiple models during cross-validation.

Lars is ideal for quick regression model prototyping. LarsCV is preferred for model selection with hyperparameter tuning to achieve better performance.

from sklearn.datasets import make_regression
from sklearn.linear_model import Lars, LarsCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit and evaluate Lars with default hyperparameters
lars = Lars()
lars.fit(X_train, y_train)
y_pred_lars = lars.predict(X_test)
print(f"Lars MSE: {mean_squared_error(y_test, y_pred_lars):.3f}")

# Fit and evaluate LarsCV with cross-validation
larscv = LarsCV(cv=5)
larscv.fit(X_train, y_train)
y_pred_larscv = larscv.predict(X_test)
print(f"LarsCV MSE: {mean_squared_error(y_test, y_pred_larscv):.3f}")
print(f"Best hyperparameters: {larscv.get_params()}")

Running the example gives an output like:

Lars MSE: 0.011
LarsCV MSE: 0.011
Best hyperparameters: {'copy_X': True, 'cv': 5, 'eps': 2.220446049250313e-16, 'fit_intercept': True, 'max_iter': 500, 'max_n_alphas': 1000, 'n_jobs': None, 'precompute': 'auto', 'verbose': False}

The steps are as follows:

Generate a synthetic regression dataset using make_regression.
Split the data into training and test sets using train_test_split.
Instantiate Lars with default hyperparameters, fit it on the training data, and evaluate its performance on the test set.
Instantiate LarsCV with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set.
Compare the test set performance (mean squared error) of both models and print the best hyperparameters found by LarsCV.

See Also