Dictionary Learning is a method used for extracting a sparse representation of data, particularly useful for dimensionality reduction and feature extraction.
The key hyperparameters of DictionaryLearning
include n_components
(number of dictionary elements), alpha
(sparsity controlling parameter), and max_iter
(maximum number of iterations).
This algorithm is appropriate for dimensionality reduction, feature extraction, and signal processing problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.decomposition import DictionaryLearning
import matplotlib.pyplot as plt
# generate binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)
# fit the model
model = DictionaryLearning(n_components=2, alpha=1, max_iter=500, random_state=1)
X_transformed = model.fit_transform(X)
# plot the transformed dataset
plt.scatter(X_transformed[:, 0], X_transformed[:, 1], c=y)
plt.title('Dictionary Learning Transformed Data')
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.show()
# make a prediction (illustrative, not typical for dimensionality reduction)
X_new = model.transform(X[:1])
print('Transformed sample:', X_new)
Running the example gives an output like:
The steps are as follows:
First, a synthetic binary classification dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
), classes (n_classes
), and a fixed random seed (random_state
) for reproducibility.Next, a
DictionaryLearning
model is instantiated with 2 components and fit on the dataset using thefit_transform()
method.The transformed dataset is visualized by plotting the new feature space using
matplotlib
.Lastly, a single data sample is transformed using the
transform()
method to demonstrate how new data can be mapped to the learned dictionary space.
This example demonstrates how to apply DictionaryLearning
for dimensionality reduction and visualize the transformed data.