Configure QuadraticDiscriminantAnalysis "reg_param" Parameter

The reg_param parameter in scikit-learn’s QuadraticDiscriminantAnalysis controls the regularization of the per-class covariance estimates.

Quadratic Discriminant Analysis (QDA) is a classification method that assumes each class has its own covariance matrix. It’s particularly useful when the decision boundary between classes is non-linear.

The reg_param parameter determines the amount of shrinkage applied to the class covariance matrices. Higher values increase regularization, which can help prevent overfitting, especially when the number of features is large compared to the number of samples.

The default value for reg_param is 0.0, which means no regularization is applied. In practice, values between 0.0 and 1.0 are commonly used, depending on the dataset’s characteristics and the model’s performance.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_classes=3,
                           n_informative=5, n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different reg_param values
reg_param_values = [0.0, 0.1, 0.5, 1.0]
accuracies = []

for reg in reg_param_values:
    qda = QuadraticDiscriminantAnalysis(reg_param=reg)
    qda.fit(X_train, y_train)
    y_pred = qda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"reg_param={reg}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

reg_param=0.0, Accuracy: 0.850
reg_param=0.1, Accuracy: 0.860
reg_param=0.5, Accuracy: 0.845
reg_param=1.0, Accuracy: 0.660

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train QuadraticDiscriminantAnalysis models with different reg_param values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting reg_param:

Start with the default value of 0.0 and gradually increase if overfitting occurs
Use cross-validation to find the optimal reg_param for your specific dataset
Higher values of reg_param are more useful when the number of features is large relative to the number of samples

Issues to consider:

The optimal reg_param depends on the dataset’s characteristics and the problem at hand
Too little regularization (low reg_param) can lead to overfitting, especially with high-dimensional data
Too much regularization (high reg_param) can lead to underfitting, oversimplifying the decision boundary

See Also