Configure LinearDiscriminantAnalysis "tol" Parameter

The tol parameter in scikit-learn’s LinearDiscriminantAnalysis (LDA) controls the threshold for stopping the algorithm’s optimization process.

LDA is a dimensionality reduction and classification technique that projects features onto a lower-dimensional space while maximizing class separability. It’s particularly useful when the classes are assumed to have multivariate Gaussian distributions with equal covariance matrices.

The tol parameter sets the tolerance for the stopping criterion. A smaller value leads to more precise results but may require more iterations to converge, while a larger value might converge faster but potentially with less accurate results.

The default value for tol is 1e-4 (0.0001). In practice, values typically range from 1e-6 to 1e-3, depending on the desired trade-off between precision and computation time.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3,
                           n_informative=10, n_redundant=5,
                           n_clusters_per_class=1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [1e-6, 1e-5, 1e-4, 1e-3]
accuracies = []

for tol in tol_values:
    lda = LinearDiscriminantAnalysis(tol=tol)
    lda.fit(X_train, y_train)
    y_pred = lda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"tol={tol:.1e}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

tol=1.0e-06, Accuracy: 0.910
tol=1.0e-05, Accuracy: 0.910
tol=1.0e-04, Accuracy: 0.910
tol=1.0e-03, Accuracy: 0.910

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train LinearDiscriminantAnalysis models with different tol values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting tol:

Start with the default value of 1e-4 and adjust based on model performance and convergence speed
Decrease tol for potentially higher precision, but be aware of increased computation time
Increase tol if faster convergence is needed and slight loss in accuracy is acceptable

Issues to consider:

The optimal tol value depends on the dataset’s characteristics and the desired precision-speed trade-off
Very small tol values may lead to overfitting or numerical instability
Large tol values might cause premature convergence, potentially missing important patterns in the data

See Also