The Iris dataset is a well-known dataset used for classification tasks to identify the species of iris flowers based on their features.
Key function arguments when loading the dataset include return_X_y
to specify if data should be returned as a tuple, and as_frame
to get the data as a pandas DataFrame.
This is a multi-class classification problem where algorithms like Decision Trees, k-Nearest Neighbors, and Support Vector Machines are commonly applied.
from sklearn.datasets import load_iris
# Load the dataset
dataset = load_iris(as_frame=True)
# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Feature types:\n{dataset.data.dtypes}")
# Show summary statistics
print(f"Summary statistics:\n{dataset.data.describe()}")
# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{dataset.data.head()}")
# Split the dataset into input and output elements
X = dataset.data
y = dataset.target
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")
Running the example gives an output like:
Dataset shape: (150, 4)
Feature types:
sepal length (cm) float64
sepal width (cm) float64
petal length (cm) float64
petal width (cm) float64
dtype: object
Summary statistics:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
count 150.000000 150.000000 150.000000 150.000000
mean 5.843333 3.057333 3.758000 1.199333
std 0.828066 0.435866 1.765298 0.762238
min 4.300000 2.000000 1.000000 0.100000
25% 5.100000 2.800000 1.600000 0.300000
50% 5.800000 3.000000 4.350000 1.300000
75% 6.400000 3.300000 5.100000 1.800000
max 7.900000 4.400000 6.900000 2.500000
First few rows of the dataset:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
Input shape: (150, 4)
Output shape: (150,)
Import the
load_iris
function fromsklearn.datasets
:- This function allows us to load the Iris dataset directly from the scikit-learn library.
Load the dataset using
load_iris()
:- Use
as_frame=True
to return the dataset as a pandas DataFrame for easier data manipulation and analysis.
- Use
Print the dataset shape and feature types:
- Access the shape using
dataset.data.shape
. - Show the data types of the features using
dataset.data.dtypes
.
- Access the shape using
Display summary statistics:
- Use
dataset.data.describe()
to get a statistical summary of the dataset.
- Use
Display the first few rows of the dataset:
- Print the initial rows using
dataset.data.head()
to get a sense of the dataset structure and content.
- Print the initial rows using
Split the dataset into input and output elements:
- Separate the features (
X
) from the target variable (y
). - Print the shapes of
X
andy
to confirm the split.
- Separate the features (
This example demonstrates how to quickly load and explore the Iris dataset using scikit-learn’s load_iris()
function, allowing you to inspect the data’s shape, types, summary statistics, and visualize a key feature. This sets the stage for further preprocessing and application of classification algorithms.