TruncatedSVD is a dimensionality reduction algorithm that performs singular value decomposition on the input data matrix. It is commonly used to reduce the number of features in datasets, particularly those with a large number of features like text data and sparse matrices.
The key hyperparameters of TruncatedSVD
include n_components
, which specifies the number of singular values and vectors to compute, and algorithm
, which determines the algorithm to use for computing the SVD. Common values for algorithm
are 'randomized'
and 'arpack'
.
The algorithm is appropriate for dimensionality reduction tasks.
from sklearn.decomposition import TruncatedSVD
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# generate a high-dimensional dataset
X, y = make_classification(n_samples=100, n_features=50, n_classes=2, random_state=1)
# create and fit the model
svd = TruncatedSVD(n_components=2, random_state=1)
X_reduced = svd.fit_transform(X)
# plot the reduced dataset
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y)
plt.xlabel('Component 1')
plt.ylabel('Component 2')
plt.title('2D projection using TruncatedSVD')
plt.show()
Running the example gives an output like:
The steps are as follows:
Generate Data: A synthetic high-dimensional dataset is generated using
make_classification()
with 100 samples and 50 features.Instantiate Model: A
TruncatedSVD
model is created withn_components=2
to reduce the data to 2 dimensions.Fit and Transform: The model is fit on the dataset and then used to transform the data, reducing its dimensionality.
Visualize: The reduced data is plotted in a scatter plot, showing the projection of the data into 2D space.
This example demonstrates how to use TruncatedSVD
for dimensionality reduction, transforming a high-dimensional dataset into a lower-dimensional space for visualization or further processing.