Lars (Least Angle Regression) is a regression algorithm suitable for high-dimensional data.
In scikit-learn, the Lars
class provides an implementation of this algorithm. Key hyperparameters include n_nonzero_coefs
(number of non-zero coefficients) and fit_intercept
(whether to calculate the intercept for this model).
LarsCV extends Lars by incorporating cross-validation to select the best hyperparameters automatically. Its key hyperparameters include cv
(number of folds for cross-validation) and max_iter
(maximum number of iterations).
The main difference is that Lars requires manual tuning of hyperparameters, while LarsCV automates this process using cross-validation. However, this automation comes at a computational cost, as LarsCV trains multiple models during cross-validation.
Lars is ideal for quick regression model prototyping. LarsCV is preferred for model selection with hyperparameter tuning to achieve better performance.
from sklearn.datasets import make_regression
from sklearn.linear_model import Lars, LarsCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit and evaluate Lars with default hyperparameters
lars = Lars()
lars.fit(X_train, y_train)
y_pred_lars = lars.predict(X_test)
print(f"Lars MSE: {mean_squared_error(y_test, y_pred_lars):.3f}")
# Fit and evaluate LarsCV with cross-validation
larscv = LarsCV(cv=5)
larscv.fit(X_train, y_train)
y_pred_larscv = larscv.predict(X_test)
print(f"LarsCV MSE: {mean_squared_error(y_test, y_pred_larscv):.3f}")
print(f"Best hyperparameters: {larscv.get_params()}")
Running the example gives an output like:
Lars MSE: 0.011
LarsCV MSE: 0.011
Best hyperparameters: {'copy_X': True, 'cv': 5, 'eps': 2.220446049250313e-16, 'fit_intercept': True, 'max_iter': 500, 'max_n_alphas': 1000, 'n_jobs': None, 'precompute': 'auto', 'verbose': False}
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression
. - Split the data into training and test sets using
train_test_split
. - Instantiate
Lars
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Instantiate
LarsCV
with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set. - Compare the test set performance (mean squared error) of both models and print the best hyperparameters found by
LarsCV
.