Configure LinearDiscriminantAnalysis "covariance_estimator" Parameter

The covariance_estimator parameter in scikit-learn’s LinearDiscriminantAnalysis allows you to specify a custom method for estimating class covariance matrices.

Linear Discriminant Analysis (LDA) is a classification algorithm that projects data onto a lower-dimensional space to maximize class separability. It assumes classes have identical covariance matrices.

The covariance_estimator parameter determines how these class covariance matrices are estimated. By default, LDA uses the empirical covariance, but custom estimators can improve performance, especially with high-dimensional data or small sample sizes.

The default value for covariance_estimator is None, which uses the empirical covariance. Common alternatives include shrinkage estimators and the Ledoit-Wolf estimator.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.covariance import ShrunkCovariance, LedoitWolf
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3,
                           n_informative=10, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train LDA models with different covariance estimators
estimators = [
    ('Default', None),
    ('Shrinkage', ShrunkCovariance(shrinkage=0.5)),
    ('Ledoit-Wolf', LedoitWolf())
]

for name, estimator in estimators:
    lda = LinearDiscriminantAnalysis(solver='lsqr', covariance_estimator=estimator)
    lda.fit(X_train, y_train)
    y_pred = lda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"{name} estimator accuracy: {accuracy:.3f}")

Running the example gives an output like:

Default estimator accuracy: 0.740
Shrinkage estimator accuracy: 0.735
Ledoit-Wolf estimator accuracy: 0.740

The key steps in this example are:

Generate a synthetic multi-class dataset suitable for LDA
Split the data into train and test sets
Create LDA models with different covariance estimators
Train the models and evaluate their accuracy on the test set

Some tips for using covariance_estimator:

Use shrinkage estimators when dealing with high-dimensional data
The Ledoit-Wolf estimator automatically determines the optimal shrinkage
Custom estimators should have fit() and covariance_ attributes

Issues to consider:

The optimal estimator depends on your data’s characteristics
Empirical covariance may perform poorly with small sample sizes
Regularized estimators can improve stability and generalization

See Also