Configure QuadraticDiscriminantAnalysis "tol" Parameter

The tol parameter in scikit-learn’s QuadraticDiscriminantAnalysis controls the threshold for singular values in the covariance matrix.

Quadratic Discriminant Analysis (QDA) is a classification method that assumes each class has its own covariance matrix. It’s particularly useful when the assumption of a shared covariance matrix (as in Linear Discriminant Analysis) doesn’t hold.

The tol parameter determines the tolerance threshold for singular values. Singular values below this threshold are considered zero, which affects how the algorithm handles multicollinearity in the input features.

The default value for tol is 1e-4 (0.0001). In practice, values between 1e-6 and 1e-2 are commonly used, depending on the dataset’s characteristics and the desired sensitivity to multicollinearity.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=3,
                           n_informative=8, n_redundant=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [1e-6, 1e-4, 1e-2]
accuracies = []

for tol in tol_values:
    qda = QuadraticDiscriminantAnalysis(tol=tol)
    qda.fit(X_train, y_train)
    y_pred = qda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"tol={tol:.0e}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

tol=1e-06, Accuracy: 0.795
tol=1e-04, Accuracy: 0.795
tol=1e-02, Accuracy: 0.795

The key steps in this example are:

Generate a synthetic multi-class classification dataset with informative and redundant features
Split the data into train and test sets
Train QuadraticDiscriminantAnalysis models with different tol values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting tol:

Start with the default value of 1e-4 and adjust based on model performance
Lower tol values make the model more sensitive to small differences in covariance
Higher tol values can help when dealing with multicollinearity or numerical instability

Issues to consider:

The optimal tol value depends on the dataset’s characteristics and feature scaling
Very low tol values may lead to overfitting, while very high values may cause underfitting
Adjusting tol can affect the model’s sensitivity to differences between classes

See Also