The fit_intercept
parameter in scikit-learn’s LogisticRegression
controls whether a constant (bias or intercept) should be added to the decision function.
LogisticRegression
is a linear model for binary classification that predicts the probability of a binary outcome using a logistic function. The fit_intercept
parameter determines whether a constant should be included in the model.
Including the intercept (setting fit_intercept=True
) adjusts the decision boundary by adding a constant term, while excluding it (setting fit_intercept=False
) forces the model to pass through the origin.
The default value for fit_intercept
is True
. Common values are True
and False
.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
n_redundant=0, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with fit_intercept=True
model_with_intercept = LogisticRegression(fit_intercept=True, random_state=42)
model_with_intercept.fit(X_train, y_train)
y_pred_with_intercept = model_with_intercept.predict(X_test)
accuracy_with_intercept = accuracy_score(y_test, y_pred_with_intercept)
print(f"fit_intercept=True, Accuracy: {accuracy_with_intercept:.3f}")
# Train with fit_intercept=False
model_without_intercept = LogisticRegression(fit_intercept=False, random_state=42)
model_without_intercept.fit(X_train, y_train)
y_pred_without_intercept = model_without_intercept.predict(X_test)
accuracy_without_intercept = accuracy_score(y_test, y_pred_without_intercept)
print(f"fit_intercept=False, Accuracy: {accuracy_without_intercept:.3f}")
Running the example gives an output like:
fit_intercept=True, Accuracy: 0.770
fit_intercept=False, Accuracy: 0.750
The key steps in this example are:
- Generate a synthetic binary classification dataset.
- Split the data into train and test sets.
- Train
LogisticRegression
models withfit_intercept=True
andfit_intercept=False
. - Evaluate the accuracy of each model on the test set.
Some tips and heuristics for setting fit_intercept
:
- Use
fit_intercept=True
if your data does not include the intercept term already. - Use
fit_intercept=False
if your data is already centered around zero or includes the intercept term.
Issues to consider:
- Not including an intercept when necessary can lead to biased models.
- Always verify the data preprocessing steps to ensure they align with the
fit_intercept
setting.