The Iris dataset is a classic dataset for classification tasks with features such as sepal length, sepal width, petal length, and petal width.
Key function arguments when loading the dataset include name
to specify the dataset on OpenML, version
to specify the version of the dataset, and as_frame
to get the data as a pandas DataFrame.
This is a classification problem where common algorithms like Logistic Regression, k-Nearest Neighbors, and Support Vector Machines are often applied.
from sklearn.datasets import fetch_openml
# Fetch the dataset
dataset = fetch_openml(name='iris', version=1, as_frame=True)
# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Feature types:\n{dataset.data.dtypes}")
# Show summary statistics
print(f"Summary statistics:\n{dataset.data.describe()}")
# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{dataset.data.head()}")
Running the example gives an output like:
Dataset shape: (150, 4)
Feature types:
sepallength float64
sepalwidth float64
petallength float64
petalwidth float64
dtype: object
Summary statistics:
sepallength sepalwidth petallength petalwidth
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.054000 3.758667 1.198667
std 0.828066 0.433594 1.764420 0.763161
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
First few rows of the dataset:
sepallength sepalwidth petallength petalwidth
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
The steps are as follows:
Import the
fetch_openml
function fromsklearn.datasets
:- This function allows us to load datasets directly from the OpenML repository.
Fetch the dataset using
fetch_openml()
:- Use
name='iris'
to specify the Iris dataset. - Set
version=1
to ensure we get the correct version. - Use
as_frame=True
to return the dataset as a pandas DataFrame for easier manipulation.
- Use
Print the dataset shape and feature types:
- Access the shape using
dataset.data.shape
. - Show the data types of the features using
dataset.data.dtypes
.
- Access the shape using
Display summary statistics:
- Use
dataset.data.describe()
to get a statistical summary of the dataset.
- Use
Display the first few rows of the dataset:
- Print the initial rows using
dataset.data.head()
to understand the structure and content.
- Print the initial rows using
This example demonstrates how to load and explore the Iris dataset using scikit-learn’s fetch_openml()
function, allowing you to inspect the data’s shape, types, and summary statistics. This sets the stage for further preprocessing and application of classification algorithms.