Configure LinearRegression "fit_intercept" Parameter

The fit_intercept parameter in scikit-learn’s LinearRegression determines whether to calculate the intercept for the linear model.

When fit_intercept is set to True (default), the model tries to find the best-fitting line that intersects the origin. If set to False, the model forces the line to pass through the origin, which can be useful in certain scenarios.

The default value for fit_intercept is True, as most linear regression models benefit from having an intercept term. However, there are cases where setting it to False might be appropriate, such as when the data is already centered around the origin.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with fit_intercept=True
lr_true = LinearRegression(fit_intercept=True)
lr_true.fit(X_train, y_train)
y_pred_true = lr_true.predict(X_test)
mse_true = mean_squared_error(y_test, y_pred_true)
print(f"fit_intercept=True, Coefficients: {lr_true.coef_}, Intercept: {lr_true.intercept_}, MSE: {mse_true:.2f}")

# Train with fit_intercept=False
lr_false = LinearRegression(fit_intercept=False)
lr_false.fit(X_train, y_train)
y_pred_false = lr_false.predict(X_test)
mse_false = mean_squared_error(y_test, y_pred_false)
print(f"fit_intercept=False, Coefficients: {lr_false.coef_}, MSE: {mse_false:.2f}")

The output of running this example would look like:

fit_intercept=True, Coefficients: [46.747264], Intercept: 0.19844442845175525, MSE: 416.81
fit_intercept=False, Coefficients: [46.71666433], MSE: 421.03

The key steps in this example are:

Generate a synthetic regression dataset with a single feature
Split the data into train and test sets
Train LinearRegression models with fit_intercept set to True and False
Evaluate the mean squared error of each model on the test set

Tips and heuristics for setting fit_intercept:

Include the intercept term unless there is a specific reason not to
Excluding the intercept can be appropriate when the data is already centered around the origin
Models without an intercept term may have worse performance if the true relationship has a non-zero intercept

Issues to consider:

Setting fit_intercept to False forces the model to pass through the origin, which can impact model performance and interpretation
Excluding the intercept may introduce bias if the true relationship has a non-zero intercept
The intercept term can help capture the overall level of the response variable

See Also