The LFW (Labeled Faces in the Wild) people dataset consists of images of faces collected from the web and is widely used for face recognition and image classification tasks. The images are labeled with the name of the person pictured.
Key function arguments when loading the dataset include min_faces_per_person
to specify the minimum number of pictures per person to include, and resize
to reduce the computational load by resizing the images.
This is an image classification problem where common algorithms like Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Convolutional Neural Networks (CNN) are often applied.
from sklearn.datasets import fetch_lfw_people
import matplotlib.pyplot as plt
# Fetch the dataset
lfw_people = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
# Display dataset shape and types
print(f"Dataset shape: {lfw_people.data.shape}")
print(f"Feature types: {lfw_people.data.dtype}")
# Show summary statistics
print(f"Number of classes: {len(lfw_people.target_names)}")
print(f"Number of samples per class:\n{[(name, sum(lfw_people.target == idx)) for idx, name in enumerate(lfw_people.target_names)]}")
# Display first few rows of the dataset
print(f"First few labels of the dataset:\n{lfw_people.target[:5]}")
print(f"First few images of the dataset:")
# Plot examples from the dataset
fig, axes = plt.subplots(1, 5, figsize=(15, 8), subplot_kw={'xticks':[], 'yticks':[]})
for i, ax in enumerate(axes):
ax.imshow(lfw_people.images[i], cmap='gray')
ax.set_title(lfw_people.target_names[lfw_people.target[i]])
plt.show()
Running the example gives an output like:
Dataset shape: (1288, 1850)
Feature types: float32
Number of classes: 7
Number of samples per class:
[('Ariel Sharon', 77), ('Colin Powell', 236), ('Donald Rumsfeld', 121), ('George W Bush', 530), ('Gerhard Schroeder', 109), ('Hugo Chavez', 71), ('Tony Blair', 144)]
First few labels of the dataset:
[5 6 3 1 0]
The steps are as follows:
Import the
fetch_lfw_people
function fromsklearn.datasets
:- This function allows us to load the LFW people dataset directly from the scikit-learn library.
Fetch the dataset using
fetch_lfw_people()
:- Use
min_faces_per_person=70
to include only those individuals with at least 70 pictures. - Use
resize=0.4
to resize images to 40% of their original size, reducing computational load.
- Use
Print the dataset shape and feature types:
- Access the shape using
lfw_people.data.shape
. - Show the data type of the features using
lfw_people.data.dtype
.
- Access the shape using
Display summary statistics:
- Print the number of classes using
len(lfw_people.target_names)
. - Show the number of samples per class to understand the dataset distribution.
- Print the number of classes using
Display the first few labels and plot the first few images of the dataset:
- Print the initial labels using
lfw_people.target[:5]
. - Plot the first few images with corresponding labels using matplotlib for a quick visual inspection.
- Print the initial labels using
This example demonstrates how to load and explore the LFW people dataset using scikit-learn’s fetch_lfw_people()
function, allowing you to inspect the data’s shape, types, class distribution, and visualize sample images. This sets the stage for further preprocessing and application of image classification algorithms.