SKLearner Home | About | Contact | Examples

Scikit-Learn fetch_lfw_pairs() Dataset

The Labeled Faces in the Wild (LFW) pairs dataset is used for evaluating face verification algorithms.

This dataset contains pairs of images with labels indicating whether the pairs match or not.

Key arguments include subset to specify the portion of the dataset to load, and color to determine if the images are loaded in color.

This is a classification problem where algorithms like Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) are often applied.

from sklearn.datasets import fetch_lfw_pairs

# Fetch the dataset
dataset = fetch_lfw_pairs(subset='train', color=True)

# Display dataset shape and types
print(f"Dataset shape: {dataset.pairs.shape}")
print(f"Image pair shape: {dataset.pairs[0].shape}")
print(f"Labels shape: {dataset.target.shape}")

# Show summary statistics
print(f"Number of pairs: {len(dataset.pairs)}")
print(f"Number of positive pairs: {sum(dataset.target == 1)}")
print(f"Number of negative pairs: {sum(dataset.target == 0)}")

# Display first few values of the dataset
print(f"First pair images shapes: {dataset.pairs[0][0].shape}, {dataset.pairs[0][1].shape}")
print(f"First pair label: {dataset.target[0]}")

Running the example gives an output like:

Dataset shape: (2200, 2, 62, 47, 3)
Image pair shape: (2, 62, 47, 3)
Labels shape: (2200,)
Number of pairs: 2200
Number of positive pairs: 1100
Number of negative pairs: 1100
First pair images shapes: (62, 47, 3), (62, 47, 3)
First pair label: 1

This example demonstrates how to quickly load and explore the LFW pairs dataset using scikit-learn’s fetch_lfw_pairs() function, allowing you to inspect the data’s shape, types, summary statistics, and visualize key features. This sets the stage for further preprocessing and application of face verification algorithms.



See Also