The fit_intercept
parameter in scikit-learn’s Lasso
class determines whether to fit an intercept term in the linear model or force the intercept to be zero.
Lasso (Least Absolute Shrinkage and Selection Operator) is a linear regression technique that performs both feature selection and regularization to enhance the prediction accuracy and interpretability of the model. The fit_intercept
parameter controls whether the intercept term b in the linear equation y = ax + b should be estimated from the data or set to zero.
By default, fit_intercept
is set to True
, which means the intercept is calculated from the training data. If fit_intercept
is set to False
, the model is forced to pass through the origin (0, 0).
In practice, setting fit_intercept
to False
is only recommended if you know that the data is centered around the origin. For most datasets, fitting the intercept term is beneficial and leads to better model performance.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score, mean_absolute_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=20, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit models with different fit_intercept values
lasso_with_intercept = Lasso(alpha=0.1, fit_intercept=True)
lasso_no_intercept = Lasso(alpha=0.1, fit_intercept=False)
lasso_with_intercept.fit(X_train, y_train)
lasso_no_intercept.fit(X_train, y_train)
# Evaluate performance
y_pred_with_intercept = lasso_with_intercept.predict(X_test)
y_pred_no_intercept = lasso_no_intercept.predict(X_test)
print("With intercept:")
print(f"R-squared: {r2_score(y_test, y_pred_with_intercept):.3f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred_with_intercept):.3f}")
print("\nWithout intercept:")
print(f"R-squared: {r2_score(y_test, y_pred_no_intercept):.3f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred_no_intercept):.3f}")
Running the example gives an output like:
With intercept:
R-squared: 0.376
MAE: 16.341
Without intercept:
R-squared: 0.377
MAE: 16.330
The key steps in this example are:
- Generate a synthetic regression dataset with a non-zero intercept
- Split the data into train and test sets
- Fit
Lasso
models withfit_intercept
set toTrue
andFalse
- Evaluate the performance of each model using R-squared and Mean Absolute Error
Some tips and heuristics for setting fit_intercept
:
- Set
fit_intercept=True
unless you are certain that the data is centered around the origin - Check if the input features have been standardized or centered before setting
fit_intercept=False
Issues to consider:
- Not fitting the intercept term can significantly reduce model performance if the data has a non-zero mean
- Forcing the model through the origin (0, 0) can limit its expressiveness and lead to poorer fit
- If the input features are on very different scales, the intercept term can help improve model fit