SKLearner Home | About | Contact | Examples

Scikit-Learn make_low_rank_matrix() Dataset

How to Use the Dataset: make_low_rank_matrix()

The make_low_rank_matrix function generates a low-rank matrix with specified properties, commonly used in dimensionality reduction and matrix factorization tasks.

Key function arguments include n_samples for the number of rows, n_features for the number of columns, and effective_rank for the approximate rank of the matrix.

This synthetic dataset is useful for testing algorithms that deal with low-rank approximations, such as PCA or collaborative filtering.

from sklearn.datasets import make_low_rank_matrix
import numpy as np

# Generate a low-rank matrix
n_samples = 100
n_features = 50
effective_rank = 10
tail_strength = 0.1
matrix = make_low_rank_matrix(n_samples=n_samples, n_features=n_features, effective_rank=effective_rank, tail_strength=tail_strength)

# Display matrix shape and type
print(f"Matrix shape: {matrix.shape}")
print(f"Matrix data type: {matrix.dtype}")

# Show summary statistics of the matrix
print(f"Summary statistics:\n{np.round(np.mean(matrix, axis=0), 2)}")

# Display first few rows of the matrix
print(f"First few rows of the matrix:\n{np.round(matrix[:5, :], 2)}")

Running the example gives an output like:

Matrix shape: (100, 50)
Matrix data type: float64
Summary statistics:
[ 0.    0.    0.    0.    0.    0.    0.   -0.   -0.    0.   -0.   -0.
 -0.01  0.   -0.   -0.   -0.    0.    0.    0.   -0.   -0.    0.    0.
  0.   -0.   -0.    0.   -0.   -0.    0.    0.    0.   -0.   -0.    0.
  0.   -0.    0.   -0.    0.    0.   -0.    0.01 -0.    0.   -0.   -0.
  0.    0.  ]
First few rows of the matrix:
[[-0.08  0.07 -0.02  0.   -0.02 -0.03  0.01  0.   -0.   -0.05 -0.04 -0.03
  -0.   -0.01  0.01 -0.01 -0.01  0.06  0.01 -0.02  0.02  0.02  0.01  0.03
   0.    0.02  0.01 -0.01 -0.04 -0.01 -0.05  0.02  0.03 -0.   -0.03  0.02
   0.03  0.01  0.02  0.   -0.01  0.03  0.01  0.01  0.02 -0.01  0.01 -0.04
   0.05 -0.01]
 [ 0.06  0.02  0.03 -0.03  0.01  0.08 -0.05  0.03 -0.02  0.04  0.05 -0.01
   0.02 -0.    0.04 -0.01  0.01 -0.03  0.01  0.01 -0.    0.03  0.07  0.
   0.03 -0.03  0.   -0.03  0.    0.02  0.03 -0.04  0.   -0.08 -0.   -0.01
   0.02 -0.03 -0.03  0.05  0.06 -0.04  0.03 -0.01  0.03  0.02 -0.02  0.03
   0.03  0.03]
 [ 0.01  0.   -0.   -0.05  0.04  0.    0.04  0.01  0.02 -0.02  0.03  0.01
   0.06  0.03  0.01 -0.02  0.03  0.   -0.   -0.02 -0.03 -0.01  0.02  0.02
  -0.01  0.01  0.01  0.   -0.    0.04 -0.02  0.01 -0.01  0.01  0.01  0.03
  -0.   -0.01 -0.05  0.04  0.03 -0.02  0.01  0.03  0.   -0.01 -0.01 -0.03
  -0.03 -0.06]
 [ 0.   -0.02  0.01 -0.    0.02 -0.01  0.   -0.03 -0.04 -0.   -0.03  0.01
  -0.02  0.02 -0.03 -0.04  0.02  0.03 -0.   -0.01 -0.03 -0.02  0.01  0.06
  -0.01  0.04  0.06  0.08  0.01  0.01  0.02  0.06  0.   -0.03 -0.01  0.02
   0.04 -0.04 -0.02 -0.05 -0.01 -0.04 -0.01  0.   -0.03  0.04  0.05 -0.05
   0.03 -0.  ]
 [-0.    0.03  0.01  0.08 -0.02  0.    0.02 -0.01  0.04  0.05  0.   -0.03
  -0.01  0.07 -0.04  0.06 -0.03  0.01  0.01 -0.01  0.02  0.01 -0.09 -0.06
   0.02 -0.04 -0.06 -0.   -0.   -0.05 -0.01  0.04 -0.01  0.02  0.01  0.02
   0.03 -0.    0.03  0.03 -0.07  0.06 -0.01  0.02 -0.02  0.01 -0.03  0.01
  -0.05  0.04]]

The steps are as follows:

  1. Import the make_low_rank_matrix function from sklearn.datasets and numpy for matrix manipulation:

    • This function creates a low-rank matrix with specified dimensions and properties.
  2. Generate the matrix using make_low_rank_matrix():

    • Set n_samples to 100 for the number of rows.
    • Set n_features to 50 for the number of columns.
    • Set effective_rank to 10 to specify the approximate rank of the matrix.
    • Set tail_strength to 0.1 to control the noise level added to the low-rank structure.
  3. Print the matrix shape and data type:

    • Access the shape using matrix.shape.
    • Show the data type of the matrix using matrix.dtype.
  4. Display summary statistics of the matrix:

    • Use np.round(np.mean(matrix, axis=0), 2) to get the mean of each column rounded to two decimal places.
  5. Display the first few rows of the matrix:

    • Print the initial rows using np.round(matrix[:5, :], 2) to get a sense of the matrix structure and content.

This example demonstrates how to generate and inspect a low-rank matrix using scikit-learn’s make_low_rank_matrix function, allowing you to analyze the matrix’s shape, data type, and summary statistics. This is useful for validating algorithms that handle low-rank approximations.



See Also