Scikit-Learn fetch_lfw_pairs() Dataset

The Labeled Faces in the Wild (LFW) pairs dataset is used for evaluating face verification algorithms.

This dataset contains pairs of images with labels indicating whether the pairs match or not.

Key arguments include subset to specify the portion of the dataset to load, and color to determine if the images are loaded in color.

This is a classification problem where algorithms like Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) are often applied.

from sklearn.datasets import fetch_lfw_pairs

# Fetch the dataset
dataset = fetch_lfw_pairs(subset='train', color=True)

# Display dataset shape and types
print(f"Dataset shape: {dataset.pairs.shape}")
print(f"Image pair shape: {dataset.pairs[0].shape}")
print(f"Labels shape: {dataset.target.shape}")

# Show summary statistics
print(f"Number of pairs: {len(dataset.pairs)}")
print(f"Number of positive pairs: {sum(dataset.target == 1)}")
print(f"Number of negative pairs: {sum(dataset.target == 0)}")

# Display first few values of the dataset
print(f"First pair images shapes: {dataset.pairs[0][0].shape}, {dataset.pairs[0][1].shape}")
print(f"First pair label: {dataset.target[0]}")

Running the example gives an output like:

Dataset shape: (2200, 2, 62, 47, 3)
Image pair shape: (2, 62, 47, 3)
Labels shape: (2200,)
Number of pairs: 2200
Number of positive pairs: 1100
Number of negative pairs: 1100
First pair images shapes: (62, 47, 3), (62, 47, 3)
First pair label: 1

Import the fetch_lfw_pairs function from sklearn.datasets:
- This function allows loading the LFW pairs dataset directly from the scikit-learn library.
Fetch the dataset using fetch_lfw_pairs():
- Use subset='train' to load the training portion of the dataset.
- Use color=True to load images in color.
Print the dataset shape and types:
- Access the shape of pairs using dataset.pairs.shape.
- Show the shape of a single image pair using dataset.pairs[0].shape.
- Show the shape of labels using dataset.target.shape.
Display summary statistics:
- Print the number of pairs using len(dataset.pairs).
- Show the number of positive and negative pairs using sum(dataset.target == 1) and sum(dataset.target == 0), respectively.
Display the first few values of the dataset:
- Print the shapes of the first pair images using dataset.pairs[0][0].shape and dataset.pairs[0][1].shape.
- Print the label of the first pair using dataset.target[0].

This example demonstrates how to quickly load and explore the LFW pairs dataset using scikit-learn’s fetch_lfw_pairs() function, allowing you to inspect the data’s shape, types, summary statistics, and visualize key features. This sets the stage for further preprocessing and application of face verification algorithms.

See Also