SKLearner Home | About | Contact | Examples

Configure LogisticRegression "tol" Parameter

The tol parameter in scikit-learn’s LogisticRegression controls the stopping criteria for the optimization algorithm used to fit the model.

Logistic Regression is a linear classification algorithm that learns the best weights to separate classes based on the logistic function. The tol parameter determines the tolerance for the stopping criteria.

The optimization algorithm stops when the loss function improvement between iterations is less than the tol value. Smaller values lead to tighter convergence and potentially better fit, but longer training times.

The default value for tol is 1e-4.

In practice, values between 1e-4 and 1e-1 are commonly used depending on the desired balance between model performance and training time.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1]
accuracies = []
times = []

for tol in tol_values:
    start_time = time.time()
    lr = LogisticRegression(tol=tol, random_state=42)
    lr.fit(X_train, y_train)
    end_time = time.time()
    y_pred = lr.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    times.append(end_time - start_time)
    print(f"tol={tol:.1e}, Accuracy: {accuracy:.3f}, Time: {end_time - start_time:.3f}s")

The output will look similar to:

tol=1.0e-05, Accuracy: 0.825, Time: 0.027s
tol=1.0e-04, Accuracy: 0.825, Time: 0.010s
tol=1.0e-03, Accuracy: 0.825, Time: 0.007s
tol=1.0e-02, Accuracy: 0.826, Time: 0.005s
tol=1.0e-01, Accuracy: 0.823, Time: 0.005s

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and redundant features
  2. Split the data into train and test sets
  3. Train LogisticRegression models with different tol values
  4. Evaluate the accuracy and training time of each model on the test set

Some tips and heuristics for setting tol:

Issues to consider:



See Also