The covariance_estimator
parameter in scikit-learn’s LinearDiscriminantAnalysis
allows you to specify a custom method for estimating class covariance matrices.
Linear Discriminant Analysis (LDA) is a classification algorithm that projects data onto a lower-dimensional space to maximize class separability. It assumes classes have identical covariance matrices.
The covariance_estimator
parameter determines how these class covariance matrices are estimated. By default, LDA uses the empirical covariance, but custom estimators can improve performance, especially with high-dimensional data or small sample sizes.
The default value for covariance_estimator
is None
, which uses the empirical covariance. Common alternatives include shrinkage estimators and the Ledoit-Wolf estimator.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.covariance import ShrunkCovariance, LedoitWolf
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=3,
n_informative=10, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train LDA models with different covariance estimators
estimators = [
('Default', None),
('Shrinkage', ShrunkCovariance(shrinkage=0.5)),
('Ledoit-Wolf', LedoitWolf())
]
for name, estimator in estimators:
lda = LinearDiscriminantAnalysis(solver='lsqr', covariance_estimator=estimator)
lda.fit(X_train, y_train)
y_pred = lda.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"{name} estimator accuracy: {accuracy:.3f}")
Running the example gives an output like:
Default estimator accuracy: 0.740
Shrinkage estimator accuracy: 0.735
Ledoit-Wolf estimator accuracy: 0.740
The key steps in this example are:
- Generate a synthetic multi-class dataset suitable for LDA
- Split the data into train and test sets
- Create LDA models with different covariance estimators
- Train the models and evaluate their accuracy on the test set
Some tips for using covariance_estimator
:
- Use shrinkage estimators when dealing with high-dimensional data
- The Ledoit-Wolf estimator automatically determines the optimal shrinkage
- Custom estimators should have
fit()
andcovariance_
attributes
Issues to consider:
- The optimal estimator depends on your data’s characteristics
- Empirical covariance may perform poorly with small sample sizes
- Regularized estimators can improve stability and generalization