MiniBatchDictionaryLearning is a dictionary learning algorithm that processes data in small, random batches, making it efficient for large datasets. It performs dimensionality reduction and feature extraction by learning a sparse representation of the input data.
The key hyperparameters of MiniBatchDictionaryLearning
include n_components
(number of dictionary atoms), alpha
(regularization parameter), and batch_size
(size of the mini-batches).
The algorithm is suitable for dimensionality reduction and feature extraction in various data types.
from sklearn.datasets import make_classification
from sklearn.decomposition import MiniBatchDictionaryLearning
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# generate synthetic dataset
X, _ = make_classification(n_samples=200, n_features=5, random_state=1)
# split into train and test sets
X_train, X_test = train_test_split(X, test_size=0.2, random_state=1)
# create and fit the model
model = MiniBatchDictionaryLearning(n_components=3, alpha=1, batch_size=10, random_state=1)
X_train_transformed = model.fit_transform(X_train)
X_test_transformed = model.transform(X_test)
# plot the transformed dataset
plt.scatter(X_train_transformed[:, 0], X_train_transformed[:, 1], label='Train')
plt.scatter(X_test_transformed[:, 0], X_test_transformed[:, 1], label='Test')
plt.legend()
plt.title('MiniBatchDictionaryLearning Transform')
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.show()
Running the example gives an output like:
Running the example gives a plot showing the transformed dataset.
The steps are as follows:
Generate a synthetic dataset using
make_classification()
. This creates a dataset with a specified number of samples and features, suitable for classification tasks.Split the dataset into training and test sets using
train_test_split()
to evaluate the model’s performance on unseen data.Instantiate a
MiniBatchDictionaryLearning
model withn_components
set to 3,alpha
to 1, andbatch_size
to 10. Fit the model on the training data using thefit_transform()
method.Transform the test set using the
transform()
method of the fitted model.Visualize the transformed data using a scatter plot. The plot shows how the training and test data are represented in the new feature space.
This example illustrates how to implement and visualize MiniBatchDictionaryLearning
for dimensionality reduction tasks. The transformed data can be used for further analysis or as input to other machine learning models.