SKLearner Home | About | Contact | Examples

Configure LogisticRegression "dual" Parameter

The dual parameter in scikit-learn’s LogisticRegression determines the solver used in the optimization problem.

Setting dual to True leads to solving the dual optimization problem, which can be faster and more stable for problems with a small number of samples and a large number of features.

By default, dual is set to False, solving the primal optimization problem. The dual parameter is only applicable when using the ’liblinear’ solver.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset with a small number of samples and large number of features
X, y = make_classification(n_samples=100, n_features=1000, n_informative=50,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with dual=False
start_time = time.time()
lr_primal = LogisticRegression(dual=False, solver='liblinear', random_state=42)
lr_primal.fit(X_train, y_train)
primal_time = time.time() - start_time
y_pred_primal = lr_primal.predict(X_test)
primal_accuracy = accuracy_score(y_test, y_pred_primal)

# Train with dual=True
start_time = time.time()
lr_dual = LogisticRegression(dual=True, solver='liblinear', random_state=42)
lr_dual.fit(X_train, y_train)
dual_time = time.time() - start_time
y_pred_dual = lr_dual.predict(X_test)
dual_accuracy = accuracy_score(y_test, y_pred_dual)

print(f"Primal form training time: {primal_time:.3f} seconds, Accuracy: {primal_accuracy:.3f}")
print(f"Dual form training time: {dual_time:.3f} seconds, Accuracy: {dual_accuracy:.3f}")

The output will look similar to:

Primal form training time: 0.006 seconds, Accuracy: 0.750
Dual form training time: 0.003 seconds, Accuracy: 0.750

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with a small number of samples and a large number of features
  2. Split the data into train and test sets
  3. Train LogisticRegression models with dual set to False and True
  4. Compare the training time and accuracy of the models

Tips and heuristics for setting dual:

Issues to consider:



See Also