Sparse coding is a representation learning technique where data is represented as a sparse combination of basis vectors. The SparseCoder
class in scikit-learn allows for efficient sparse coding of data.
The SparseCoder
uses algorithms like lasso_lars
, lasso_cd
, or omp
to find sparse representations of data. Key hyperparameters include dictionary
(basis vectors), transform_n_nonzero_coefs
(number of nonzero coefficients), and transform_alpha
(regularization parameter).
This algorithm is suitable for feature extraction and dimensionality reduction.
from sklearn.decomposition import SparseCoder
import numpy as np
# generate synthetic dataset
X = np.random.rand(10, 5)
# define the dictionary (random for simplicity)
dictionary = np.random.rand(5, 5)
# create the SparseCoder model
coder = SparseCoder(dictionary=dictionary, transform_n_nonzero_coefs=2, transform_algorithm='omp')
# fit the model
sparse_representation = coder.transform(X)
# evaluate the model (example evaluation using reconstruction error)
reconstructed_X = np.dot(sparse_representation, dictionary)
reconstruction_error = np.mean((X - reconstructed_X) ** 2)
print('Reconstruction error: %.3f' % reconstruction_error)
# make a prediction (example transformation of a new sample)
new_sample = np.random.rand(1, 5)
sparse_new_sample = coder.transform(new_sample)
print('Sparse representation of new sample:', sparse_new_sample)
Running the example gives an output like:
Reconstruction error: 0.057
Sparse representation of new sample: [[-1.98180502 0. 3.12883876 0. 0.]]
The steps are as follows:
- First, a synthetic dataset is generated using
numpy
with random values for demonstration purposes. The dictionary of basis vectors is also generated randomly. - A
SparseCoder
model is instantiated with the random dictionary and configured to use the Orthogonal Matching Pursuit (omp
) algorithm, with the number of nonzero coefficients set to 2. - The model is fit on the synthetic dataset using the
transform()
method to obtain the sparse representation. - The model’s performance is evaluated by reconstructing the original data and calculating the reconstruction error.
- A new sample is transformed using the trained
SparseCoder
to get its sparse representation.
This example shows how to use the SparseCoder
class to perform sparse coding on a dataset, demonstrating its application for feature extraction and dimensionality reduction. The SparseCoder
can be directly applied to transform new samples into sparse representations using the learned basis vectors.