Configure LinearDiscriminantAnalysis "shrinkage" Parameter

The shrinkage parameter in scikit-learn’s LinearDiscriminantAnalysis controls the regularization of the covariance matrix estimation.

Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification technique that projects data onto a lower-dimensional space while maximizing class separability. The shrinkage parameter helps address issues with small sample sizes or high-dimensional data.

Shrinkage reduces the variance of the covariance matrix estimation by shrinking it towards a diagonal matrix. This can improve the stability and generalization of the LDA model, especially when the number of features is large compared to the number of samples.

The default value for shrinkage is None, which means no shrinkage is applied. When set to ‘auto’, scikit-learn automatically determines the optimal shrinkage parameter using the Ledoit-Wolf method.

In practice, values between 0 and 1 can be used, with 0 meaning no shrinkage and 1 meaning full shrinkage to a diagonal matrix.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=2,
                           n_redundant=2, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train with different shrinkage values
shrinkage_values = [None, 'auto', 0.1, 0.5, 0.9]
accuracies = []

for shrinkage in shrinkage_values:
    lda = LinearDiscriminantAnalysis(solver='lsqr', shrinkage=shrinkage)
    lda.fit(X_train, y_train)
    y_pred = lda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"shrinkage={shrinkage}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

shrinkage=None, Accuracy: 0.967
shrinkage=auto, Accuracy: 0.967
shrinkage=0.1, Accuracy: 0.967
shrinkage=0.5, Accuracy: 0.967
shrinkage=0.9, Accuracy: 0.933

The key steps in this example are:

Generate a synthetic multi-class classification dataset with informative and redundant features
Split the data into train and test sets
Train LinearDiscriminantAnalysis models with different shrinkage values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting shrinkage:

Use ‘auto’ for automatic shrinkage estimation when unsure about the optimal value
Try values between 0 and 1 to find the best performance for your specific dataset
Consider using shrinkage when dealing with high-dimensional data or small sample sizes

Issues to consider:

The optimal shrinkage value depends on the dataset characteristics and sample size
Too little shrinkage may not sufficiently regularize the covariance estimation
Too much shrinkage can oversimplify the model and lead to underfitting
The effect of shrinkage may be more pronounced in high-dimensional or noisy datasets

See Also