How to Use the Dataset: make_low_rank_matrix()
The make_low_rank_matrix
function generates a low-rank matrix with specified properties, commonly used in dimensionality reduction and matrix factorization tasks.
Key function arguments include n_samples
for the number of rows, n_features
for the number of columns, and effective_rank
for the approximate rank of the matrix.
This synthetic dataset is useful for testing algorithms that deal with low-rank approximations, such as PCA or collaborative filtering.
from sklearn.datasets import make_low_rank_matrix
import numpy as np
# Generate a low-rank matrix
n_samples = 100
n_features = 50
effective_rank = 10
tail_strength = 0.1
matrix = make_low_rank_matrix(n_samples=n_samples, n_features=n_features, effective_rank=effective_rank, tail_strength=tail_strength)
# Display matrix shape and type
print(f"Matrix shape: {matrix.shape}")
print(f"Matrix data type: {matrix.dtype}")
# Show summary statistics of the matrix
print(f"Summary statistics:\n{np.round(np.mean(matrix, axis=0), 2)}")
# Display first few rows of the matrix
print(f"First few rows of the matrix:\n{np.round(matrix[:5, :], 2)}")
Running the example gives an output like:
Matrix shape: (100, 50)
Matrix data type: float64
Summary statistics:
[ 0. 0. 0. 0. 0. 0. 0. -0. -0. 0. -0. -0.
-0.01 0. -0. -0. -0. 0. 0. 0. -0. -0. 0. 0.
0. -0. -0. 0. -0. -0. 0. 0. 0. -0. -0. 0.
0. -0. 0. -0. 0. 0. -0. 0.01 -0. 0. -0. -0.
0. 0. ]
First few rows of the matrix:
[[-0.08 0.07 -0.02 0. -0.02 -0.03 0.01 0. -0. -0.05 -0.04 -0.03
-0. -0.01 0.01 -0.01 -0.01 0.06 0.01 -0.02 0.02 0.02 0.01 0.03
0. 0.02 0.01 -0.01 -0.04 -0.01 -0.05 0.02 0.03 -0. -0.03 0.02
0.03 0.01 0.02 0. -0.01 0.03 0.01 0.01 0.02 -0.01 0.01 -0.04
0.05 -0.01]
[ 0.06 0.02 0.03 -0.03 0.01 0.08 -0.05 0.03 -0.02 0.04 0.05 -0.01
0.02 -0. 0.04 -0.01 0.01 -0.03 0.01 0.01 -0. 0.03 0.07 0.
0.03 -0.03 0. -0.03 0. 0.02 0.03 -0.04 0. -0.08 -0. -0.01
0.02 -0.03 -0.03 0.05 0.06 -0.04 0.03 -0.01 0.03 0.02 -0.02 0.03
0.03 0.03]
[ 0.01 0. -0. -0.05 0.04 0. 0.04 0.01 0.02 -0.02 0.03 0.01
0.06 0.03 0.01 -0.02 0.03 0. -0. -0.02 -0.03 -0.01 0.02 0.02
-0.01 0.01 0.01 0. -0. 0.04 -0.02 0.01 -0.01 0.01 0.01 0.03
-0. -0.01 -0.05 0.04 0.03 -0.02 0.01 0.03 0. -0.01 -0.01 -0.03
-0.03 -0.06]
[ 0. -0.02 0.01 -0. 0.02 -0.01 0. -0.03 -0.04 -0. -0.03 0.01
-0.02 0.02 -0.03 -0.04 0.02 0.03 -0. -0.01 -0.03 -0.02 0.01 0.06
-0.01 0.04 0.06 0.08 0.01 0.01 0.02 0.06 0. -0.03 -0.01 0.02
0.04 -0.04 -0.02 -0.05 -0.01 -0.04 -0.01 0. -0.03 0.04 0.05 -0.05
0.03 -0. ]
[-0. 0.03 0.01 0.08 -0.02 0. 0.02 -0.01 0.04 0.05 0. -0.03
-0.01 0.07 -0.04 0.06 -0.03 0.01 0.01 -0.01 0.02 0.01 -0.09 -0.06
0.02 -0.04 -0.06 -0. -0. -0.05 -0.01 0.04 -0.01 0.02 0.01 0.02
0.03 -0. 0.03 0.03 -0.07 0.06 -0.01 0.02 -0.02 0.01 -0.03 0.01
-0.05 0.04]]
The steps are as follows:
Import the
make_low_rank_matrix
function fromsklearn.datasets
andnumpy
for matrix manipulation:- This function creates a low-rank matrix with specified dimensions and properties.
Generate the matrix using
make_low_rank_matrix()
:- Set
n_samples
to 100 for the number of rows. - Set
n_features
to 50 for the number of columns. - Set
effective_rank
to 10 to specify the approximate rank of the matrix. - Set
tail_strength
to 0.1 to control the noise level added to the low-rank structure.
- Set
Print the matrix shape and data type:
- Access the shape using
matrix.shape
. - Show the data type of the matrix using
matrix.dtype
.
- Access the shape using
Display summary statistics of the matrix:
- Use
np.round(np.mean(matrix, axis=0), 2)
to get the mean of each column rounded to two decimal places.
- Use
Display the first few rows of the matrix:
- Print the initial rows using
np.round(matrix[:5, :], 2)
to get a sense of the matrix structure and content.
- Print the initial rows using
This example demonstrates how to generate and inspect a low-rank matrix using scikit-learn’s make_low_rank_matrix
function, allowing you to analyze the matrix’s shape, data type, and summary statistics. This is useful for validating algorithms that handle low-rank approximations.