SKLearner Home | About | Contact | Examples

Configure LinearDiscriminantAnalysis "shrinkage" Parameter

The shrinkage parameter in scikit-learn’s LinearDiscriminantAnalysis controls the regularization of the covariance matrix estimation.

Linear Discriminant Analysis (LDA) is a dimensionality reduction and classification technique that projects data onto a lower-dimensional space while maximizing class separability. The shrinkage parameter helps address issues with small sample sizes or high-dimensional data.

Shrinkage reduces the variance of the covariance matrix estimation by shrinking it towards a diagonal matrix. This can improve the stability and generalization of the LDA model, especially when the number of features is large compared to the number of samples.

The default value for shrinkage is None, which means no shrinkage is applied. When set to ‘auto’, scikit-learn automatically determines the optimal shrinkage parameter using the Ledoit-Wolf method.

In practice, values between 0 and 1 can be used, with 0 meaning no shrinkage and 1 meaning full shrinkage to a diagonal matrix.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=100, n_features=20, n_informative=2,
                           n_redundant=2, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train with different shrinkage values
shrinkage_values = [None, 'auto', 0.1, 0.5, 0.9]
accuracies = []

for shrinkage in shrinkage_values:
    lda = LinearDiscriminantAnalysis(solver='lsqr', shrinkage=shrinkage)
    lda.fit(X_train, y_train)
    y_pred = lda.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"shrinkage={shrinkage}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

shrinkage=None, Accuracy: 0.967
shrinkage=auto, Accuracy: 0.967
shrinkage=0.1, Accuracy: 0.967
shrinkage=0.5, Accuracy: 0.967
shrinkage=0.9, Accuracy: 0.933

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset with informative and redundant features
  2. Split the data into train and test sets
  3. Train LinearDiscriminantAnalysis models with different shrinkage values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting shrinkage:

Issues to consider:



See Also