The Linnerud dataset is a multi-output regression dataset that includes exercise and physiological data.
Key function arguments when loading the dataset include return_X_y
to specify if data should be returned as a tuple, and as_frame
to get the data as a pandas DataFrame.
This is a multi-output regression problem where common algorithms like Linear Regression and Ridge Regression are often applied.
from sklearn.datasets import load_linnerud
# Load the dataset
dataset = load_linnerud(as_frame=True)
# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Feature types:\n{dataset.data.dtypes}")
# Show summary statistics
print(f"Summary statistics:\n{dataset.data.describe()}")
# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{dataset.data.head()}")
# Split the dataset into input and output elements
X = dataset.data
y = dataset.target
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")
Running the example gives an output like:
Dataset shape: (20, 3)
Feature types:
Chins float64
Situps float64
Jumps float64
dtype: object
Summary statistics:
Chins Situps Jumps
count 20.000000 20.000000 20.00000
mean 9.450000 145.550000 70.30000
std 5.286278 62.566575 51.27747
min 1.000000 50.000000 25.00000
25% 4.750000 101.000000 39.50000
50% 11.500000 122.500000 54.00000
75% 13.250000 210.000000 85.25000
max 17.000000 251.000000 250.00000
First few rows of the dataset:
Chins Situps Jumps
0 5.0 162.0 60.0
1 2.0 110.0 60.0
2 12.0 101.0 101.0
3 12.0 105.0 37.0
4 13.0 155.0 58.0
Input shape: (20, 3)
Output shape: (20, 3)
The steps are as follows:
Import the
load_linnerud
function fromsklearn.datasets
:- This function loads the Linnerud dataset directly from the scikit-learn library.
Load the dataset using
load_linnerud()
:- Use
as_frame=True
to return the dataset as a pandas DataFrame for easier data manipulation and analysis.
- Use
Print the dataset shape and feature types:
- Access the shape using
dataset.data.shape
. - Show the data types of the features using
dataset.data.dtypes
.
- Access the shape using
Display summary statistics:
- Use
dataset.data.describe()
to get a statistical summary of the dataset.
- Use
Display the first few rows of the dataset:
- Print the initial rows using
dataset.data.head()
to understand the dataset structure and content.
- Print the initial rows using
Split the dataset into input and output elements:
- Separate the features (
X
) from the target variables (y
). - Print the shapes of
X
andy
to confirm the split.
- Separate the features (
This example demonstrates how to load and explore the Linnerud dataset using scikit-learn’s load_linnerud()
function, allowing you to inspect the data’s shape, types, and summary statistics. This prepares the dataset for further preprocessing and application of regression algorithms.