MiniBatchNMF is a variant of Non-negative Matrix Factorization (NMF) designed for large datasets. It performs dimensionality reduction by factorizing a data matrix into non-negative components, suitable for feature extraction and compression.
The key hyperparameters of MiniBatchNMF
include n_components
(number of components), batch_size
(size of the mini-batches), and init
(initialization method).
The algorithm is appropriate for dimensionality reduction tasks where the data is non-negative, commonly used in fields like image processing and text mining.
from sklearn.datasets import make_blobs
from sklearn.decomposition import MiniBatchNMF
import matplotlib.pyplot as plt
import numpy as np
# generate a synthetic dataset
X, _ = make_blobs(n_samples=100, n_features=5, centers=3, random_state=1)
X = np.abs(X)
# create the MiniBatchNMF model
model = MiniBatchNMF(n_components=2, batch_size=10, random_state=1)
# fit the model on the dataset
X_transformed = model.fit_transform(X)
# plot the transformed data
plt.scatter(X_transformed[:, 0], X_transformed[:, 1])
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.title('MiniBatchNMF Transformed Data')
plt.show()
Running the example gives an output like:
The steps are as follows:
- First, a synthetic dataset with multiple features is generated using
make_blobs()
. This dataset is designed to have non-negative values and multiple centers, making it suitable for demonstrating NMF. - Next, a
MiniBatchNMF
model is instantiated with 2 components, a batch size of 10, and a fixed random seed for reproducibility. - The model is fit on the dataset using
fit_transform()
, which returns the transformed data. - The transformed data is plotted, showing the reduction from the original feature space to a 2-dimensional space. This visualizes how MiniBatchNMF reduces the data dimensions while retaining its structure.
This example demonstrates how to apply MiniBatchNMF
for dimensionality reduction, providing a visual representation of the transformed dataset. The approach is efficient for large datasets and can be applied to various domains requiring feature extraction from non-negative data.