Configure LogisticRegression "l1_ratio" Parameter

The l1_ratio parameter in scikit-learn’s LogisticRegression controls the balance between L1 and L2 regularization.

LogisticRegression is a linear model for binary classification that can apply both L1 (lasso) and L2 (ridge) regularization. Regularization helps prevent overfitting by penalizing large coefficients.

The l1_ratio parameter determines the mix of L1 and L2 regularization. A value of 0 corresponds to L2 regularization, a value of 1 corresponds to L1 regularization, and values between 0 and 1 indicate a mix of both.

The default value for l1_ratio is 0.5 when using the ’elasticnet’ penalty.

In practice, values between 0.1 and 0.9 are commonly used depending on the need for sparsity vs. regularization strength.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different l1_ratio values
l1_ratio_values = [0.1, 0.5, 0.7, 0.9]
accuracies = []

for l1_ratio in l1_ratio_values:
    lr = LogisticRegression(penalty='elasticnet', solver='saga', l1_ratio=l1_ratio, random_state=42)
    lr.fit(X_train, y_train)
    y_pred = lr.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"l1_ratio={l1_ratio}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

l1_ratio=0.1, Accuracy: 0.770
l1_ratio=0.5, Accuracy: 0.770
l1_ratio=0.7, Accuracy: 0.770
l1_ratio=0.9, Accuracy: 0.770

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and noise features
Split the data into train and test sets
Train LogisticRegression models with different l1_ratio values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting l1_ratio:

Start with the default value of 0.5 and adjust based on performance
Higher l1_ratio values lead to more sparsity in the model, which can be useful for feature selection
Lower l1_ratio values give more weight to L2 regularization, which can be beneficial for reducing overfitting
Experiment with different values to find the optimal balance for your specific dataset

Issues to consider:

The optimal l1_ratio depends on the dataset’s size, features, and the degree of noise
Using extreme values (0 or 1) can lead to purely L2 or L1 regularization, which may not always be ideal
Mixed regularization (values between 0 and 1) often provides a good balance, but the exact ratio should be tuned based on cross-validation

See Also