SKLearner Home | About | Contact | Examples

Scikit-Learn load_linnerud() Dataset

The Linnerud dataset is a multi-output regression dataset that includes exercise and physiological data.

Key function arguments when loading the dataset include return_X_y to specify if data should be returned as a tuple, and as_frame to get the data as a pandas DataFrame.

This is a multi-output regression problem where common algorithms like Linear Regression and Ridge Regression are often applied.

from sklearn.datasets import load_linnerud

# Load the dataset
dataset = load_linnerud(as_frame=True)

# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Feature types:\n{dataset.data.dtypes}")

# Show summary statistics
print(f"Summary statistics:\n{dataset.data.describe()}")

# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{dataset.data.head()}")

# Split the dataset into input and output elements
X = dataset.data
y = dataset.target
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")

Running the example gives an output like:

Dataset shape: (20, 3)
Feature types:
Chins     float64
Situps    float64
Jumps     float64
dtype: object
Summary statistics:
           Chins      Situps      Jumps
count  20.000000   20.000000   20.00000
mean    9.450000  145.550000   70.30000
std     5.286278   62.566575   51.27747
min     1.000000   50.000000   25.00000
25%     4.750000  101.000000   39.50000
50%    11.500000  122.500000   54.00000
75%    13.250000  210.000000   85.25000
max    17.000000  251.000000  250.00000
First few rows of the dataset:
   Chins  Situps  Jumps
0    5.0   162.0   60.0
1    2.0   110.0   60.0
2   12.0   101.0  101.0
3   12.0   105.0   37.0
4   13.0   155.0   58.0
Input shape: (20, 3)
Output shape: (20, 3)

The steps are as follows:

  1. Import the load_linnerud function from sklearn.datasets:

    • This function loads the Linnerud dataset directly from the scikit-learn library.
  2. Load the dataset using load_linnerud():

    • Use as_frame=True to return the dataset as a pandas DataFrame for easier data manipulation and analysis.
  3. Print the dataset shape and feature types:

    • Access the shape using dataset.data.shape.
    • Show the data types of the features using dataset.data.dtypes.
  4. Display summary statistics:

    • Use dataset.data.describe() to get a statistical summary of the dataset.
  5. Display the first few rows of the dataset:

    • Print the initial rows using dataset.data.head() to understand the dataset structure and content.
  6. Split the dataset into input and output elements:

    • Separate the features (X) from the target variables (y).
    • Print the shapes of X and y to confirm the split.

This example demonstrates how to load and explore the Linnerud dataset using scikit-learn’s load_linnerud() function, allowing you to inspect the data’s shape, types, and summary statistics. This prepares the dataset for further preprocessing and application of regression algorithms.



See Also