The Olivetti Faces dataset consists of grayscale images of faces, which is commonly used for facial recognition and image classification tasks.
Key function arguments when loading the dataset include return_X_y
to specify if data should be returned as a tuple, and shuffle
to randomize the order of the data.
This is a classification problem where common algorithms like Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Convolutional Neural Networks (CNNs) are often applied.
from sklearn.datasets import fetch_olivetti_faces
import matplotlib.pyplot as plt
# Fetch the dataset
dataset = fetch_olivetti_faces()
# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Target shape: {dataset.target.shape}")
# Show summary statistics
print(f"Unique targets: {set(dataset.target)}")
# Display first few rows of the dataset
print(f"First few images:\n{dataset.images[:5]}")
# Plot example images
fig, axes = plt.subplots(1, 5, figsize=(10, 2.5))
for i, ax in enumerate(axes):
ax.imshow(dataset.images[i], cmap='gray')
ax.axis('off')
plt.show()
Running the example gives an output like:
scikit_learn_data
Dataset shape: (400, 4096)
Target shape: (400,)
Unique targets: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39}
First few images:
[[[0.30991736 0.3677686 0.41735536 ... 0.37190083 0.3305785 0.30578512]
[0.3429752 0.40495867 0.43801653 ... 0.37190083 0.338843 0.3140496 ]
[0.3429752 0.41735536 0.45041323 ... 0.38016528 0.338843 0.29752067]
...
[0.21487603 0.20661157 0.2231405 ... 0.15289256 0.16528925 0.17355372]
[0.20247933 0.2107438 0.2107438 ... 0.14876033 0.16115703 0.16528925]
[0.20247933 0.20661157 0.20247933 ... 0.15289256 0.16115703 0.1570248 ]]
[[0.45454547 0.47107437 0.5123967 ... 0.19008264 0.18595041 0.18595041]
[0.446281 0.48347107 0.5206612 ... 0.21487603 0.2107438 0.2107438 ]
[0.49586776 0.5165289 0.53305787 ... 0.20247933 0.20661157 0.20661157]
...
[0.77272725 0.78099173 0.7933884 ... 0.1446281 0.1446281 0.1446281 ]
[0.77272725 0.7768595 0.7892562 ... 0.13636364 0.13636364 0.13636364]
[0.7644628 0.7892562 0.78099173 ... 0.15289256 0.15289256 0.15289256]]
[[0.3181818 0.40082645 0.49173555 ... 0.40082645 0.3553719 0.30991736]
[0.30991736 0.3966942 0.47933885 ... 0.40495867 0.37603307 0.30165288]
[0.26859504 0.34710744 0.45454547 ... 0.3966942 0.37190083 0.30991736]
...
[0.1322314 0.09917355 0.08264463 ... 0.13636364 0.14876033 0.15289256]
[0.11570248 0.09504132 0.0785124 ... 0.1446281 0.1446281 0.1570248 ]
[0.11157025 0.09090909 0.0785124 ... 0.14049587 0.14876033 0.15289256]]
[[0.1983471 0.19421488 0.19421488 ... 0.58264464 0.5123967 0.45867768]
[0.21900827 0.21900827 0.21487603 ... 0.5661157 0.5123967 0.45041323]
[0.23966943 0.23966943 0.23966943 ... 0.59090906 0.5 0.46280992]
...
[0.13636364 0.14049587 0.16115703 ... 0.76033056 0.7644628 0.7355372 ]
[0.14876033 0.14876033 0.14876033 ... 0.76033056 0.75619835 0.74380165]
[0.14876033 0.14876033 0.14876033 ... 0.75206614 0.75206614 0.73966944]]
[[0.5 0.54545456 0.58264464 ... 0.2231405 0.2231405 0.2231405 ]
[0.47933885 0.5123967 0.58264464 ... 0.20247933 0.20247933 0.20247933]
[0.49173555 0.5413223 0.59504133 ... 0.21487603 0.21487603 0.21487603]
...
[0.4752066 0.41735536 0.40082645 ... 0.19421488 0.19421488 0.19421488]
[0.4752066 0.44214877 0.41735536 ... 0.16528925 0.16528925 0.16528925]
[0.4876033 0.446281 0.4338843 ... 0.17768595 0.17355372 0.17355372]]]
The steps are as follows:
Import the
fetch_olivetti_faces
function fromsklearn.datasets
andmatplotlib.pyplot
for plotting:- This function allows us to load the Olivetti Faces dataset directly from the scikit-learn library.
- Use
matplotlib.pyplot
to visualize the images.
Fetch the dataset using
fetch_olivetti_faces()
:- Load the dataset with default parameters.
Print the dataset shape and target shape:
- Access the shape of the data using
dataset.data.shape
. - Show the shape of the target labels using
dataset.target.shape
.
- Access the shape of the data using
Display summary statistics:
- Show the unique target labels using
set(dataset.target)
.
- Show the unique target labels using
Display the first few images of the dataset:
- Print the first few images using
dataset.images[:5]
.
- Print the first few images using
Plot example images:
- Use
matplotlib
to plot a few example images from the dataset to visualize the data.
- Use
This example demonstrates how to quickly load and explore the Olivetti Faces dataset using scikit-learn’s fetch_olivetti_faces()
function, allowing you to inspect the data’s shape, target labels, and visualize some example images. This sets the stage for further preprocessing and application of classification algorithms.