Scikit-Learn MiniBatchNMF Model

MiniBatchNMF is a variant of Non-negative Matrix Factorization (NMF) designed for large datasets. It performs dimensionality reduction by factorizing a data matrix into non-negative components, suitable for feature extraction and compression.

The key hyperparameters of MiniBatchNMF include n_components (number of components), batch_size (size of the mini-batches), and init (initialization method).

The algorithm is appropriate for dimensionality reduction tasks where the data is non-negative, commonly used in fields like image processing and text mining.

from sklearn.datasets import make_blobs
from sklearn.decomposition import MiniBatchNMF
import matplotlib.pyplot as plt
import numpy as np

# generate a synthetic dataset
X, _ = make_blobs(n_samples=100, n_features=5, centers=3, random_state=1)
X = np.abs(X)

# create the MiniBatchNMF model
model = MiniBatchNMF(n_components=2, batch_size=10, random_state=1)

# fit the model on the dataset
X_transformed = model.fit_transform(X)

# plot the transformed data
plt.scatter(X_transformed[:, 0], X_transformed[:, 1])
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.title('MiniBatchNMF Transformed Data')
plt.show()

Running the example gives an output like:

Scikit-Learn MiniBatchNMF

The steps are as follows:

First, a synthetic dataset with multiple features is generated using make_blobs(). This dataset is designed to have non-negative values and multiple centers, making it suitable for demonstrating NMF.
Next, a MiniBatchNMF model is instantiated with 2 components, a batch size of 10, and a fixed random seed for reproducibility.
The model is fit on the dataset using fit_transform(), which returns the transformed data.
The transformed data is plotted, showing the reduction from the original feature space to a 2-dimensional space. This visualizes how MiniBatchNMF reduces the data dimensions while retaining its structure.

This example demonstrates how to apply MiniBatchNMF for dimensionality reduction, providing a visual representation of the transformed dataset. The approach is efficient for large datasets and can be applied to various domains requiring feature extraction from non-negative data.

See Also