The intercept_scaling
parameter in scikit-learn’s LogisticRegression
controls the scaling of the intercept term.
Logistic Regression is a linear model used for binary classification that predicts the probability of a binary outcome. The intercept_scaling
parameter is useful only when the fit_intercept
parameter is set to True
and the solver is ’liblinear’. It scales the intercept term.
The default value for intercept_scaling
is 1.0. Common values range from 0.1 to 10, depending on the specific data characteristics and needs.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, n_redundant=0, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different intercept_scaling values
intercept_scaling_values = [0.1, 1.0, 5.0, 10.0]
accuracies = []
for scale in intercept_scaling_values:
lr = LogisticRegression(fit_intercept=True, solver='liblinear', intercept_scaling=scale, random_state=42)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"intercept_scaling={scale}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
intercept_scaling=0.1, Accuracy: 0.765
intercept_scaling=1.0, Accuracy: 0.770
intercept_scaling=5.0, Accuracy: 0.770
intercept_scaling=10.0, Accuracy: 0.770
The key steps in this example are:
- Generate a synthetic binary classification dataset.
- Split the data into train and test sets.
- Train
LogisticRegression
models with differentintercept_scaling
values. - Evaluate the accuracy of each model on the test set.
Some tips and heuristics for setting intercept_scaling
:
- Start with the default value of 1.0 and adjust based on model performance.
- If the intercept term needs to be scaled due to feature scaling, adjust
intercept_scaling
accordingly. - Be mindful of the solver being used (
liblinear
is required forintercept_scaling
).
Issues to consider:
- The optimal value for
intercept_scaling
depends on the dataset and the presence of feature scaling. - Improper scaling can affect the performance of the model.
- Ensure
fit_intercept
is set toTrue
and the solver is ’liblinear’ to utilizeintercept_scaling
.