Non-Negative Matrix Factorization (NMF) is a dimensionality reduction technique that decomposes a matrix into non-negative factors. It is particularly useful for data with non-negative values, such as text mining and image processing.
The key hyperparameters of NMF
include n_components
(number of components), init
(initialization method), and solver
(optimization algorithm).
NMF is suitable for dimensionality reduction tasks.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.decomposition import NMF
import matplotlib.pyplot as plt
import numpy as np
# generate synthetic dataset
X, _ = make_classification(n_samples=100, n_features=10, random_state=1)
X = np.abs(X)
# fit the model
model = NMF(n_components=2, init='random', random_state=1)
X_transformed = model.fit_transform(X)
# plot the transformed dataset
plt.scatter(X_transformed[:, 0], X_transformed[:, 1])
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.title('NMF Transformed Data')
plt.show()
Running the example gives an output like:
The steps are as follows:
First, a synthetic dataset is generated using the
make_classification()
function. This creates a dataset with a specified number of samples (n_samples
) and features (n_features
), and a fixed random seed (random_state
) for reproducibility.Next, an
NMF
model is instantiated withn_components
set to 2, using random initialization (init='random'
). The model is then fit on the dataset using thefit_transform()
method to perform the decomposition.The transformed dataset is plotted using
matplotlib
to visualize the data in the reduced-dimensional space.
This example demonstrates the basic steps to apply NMF for dimensionality reduction using scikit-learn. The transformed data can then be used for further analysis or visualization.