Sparse coding is a representation learning technique where data is represented as a sparse combination of basis vectors. The SparseCoder class in scikit-learn allows for efficient sparse coding of data.
The SparseCoder uses algorithms like lasso_lars, lasso_cd, or omp to find sparse representations of data. Key hyperparameters include dictionary (basis vectors), transform_n_nonzero_coefs (number of nonzero coefficients), and transform_alpha (regularization parameter).
This algorithm is suitable for feature extraction and dimensionality reduction.
from sklearn.decomposition import SparseCoder
import numpy as np
# generate synthetic dataset
X = np.random.rand(10, 5)
# define the dictionary (random for simplicity)
dictionary = np.random.rand(5, 5)
# create the SparseCoder model
coder = SparseCoder(dictionary=dictionary, transform_n_nonzero_coefs=2, transform_algorithm='omp')
# fit the model
sparse_representation = coder.transform(X)
# evaluate the model (example evaluation using reconstruction error)
reconstructed_X = np.dot(sparse_representation, dictionary)
reconstruction_error = np.mean((X - reconstructed_X) ** 2)
print('Reconstruction error: %.3f' % reconstruction_error)
# make a prediction (example transformation of a new sample)
new_sample = np.random.rand(1, 5)
sparse_new_sample = coder.transform(new_sample)
print('Sparse representation of new sample:', sparse_new_sample)
Running the example gives an output like:
Reconstruction error: 0.057
Sparse representation of new sample: [[-1.98180502 0. 3.12883876 0. 0.]]
The steps are as follows:
- First, a synthetic dataset is generated using
numpywith random values for demonstration purposes. The dictionary of basis vectors is also generated randomly. - A
SparseCodermodel is instantiated with the random dictionary and configured to use the Orthogonal Matching Pursuit (omp) algorithm, with the number of nonzero coefficients set to 2. - The model is fit on the synthetic dataset using the
transform()method to obtain the sparse representation. - The model’s performance is evaluated by reconstructing the original data and calculating the reconstruction error.
- A new sample is transformed using the trained
SparseCoderto get its sparse representation.
This example shows how to use the SparseCoder class to perform sparse coding on a dataset, demonstrating its application for feature extraction and dimensionality reduction. The SparseCoder can be directly applied to transform new samples into sparse representations using the learned basis vectors.