The Wine dataset is commonly used for classification tasks to predict the type of wine based on various chemical properties.
Key function arguments when loading the dataset include return_X_y
to specify if data should be returned as a tuple and as_frame
to get the data as a pandas DataFrame.
This is a multiclass classification problem where algorithms like Logistic Regression, Support Vector Machines, and Random Forests are often applied.
from sklearn.datasets import load_wine
# Load the dataset
dataset = load_wine(as_frame=True)
# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Feature types:\n{dataset.data.dtypes}")
# Show summary statistics
print(f"Summary statistics:\n{dataset.data.describe()}")
# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{dataset.data.head()}")
# Split the dataset into input and output elements
X = dataset.data
y = dataset.target
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")
Running the example gives an output like:
Dataset shape: (178, 13)
Feature types:
alcohol float64
malic_acid float64
ash float64
alcalinity_of_ash float64
magnesium float64
total_phenols float64
flavanoids float64
nonflavanoid_phenols float64
proanthocyanins float64
color_intensity float64
hue float64
od280/od315_of_diluted_wines float64
proline float64
dtype: object
Summary statistics:
alcohol malic_acid ... od280/od315_of_diluted_wines proline
count 178.000000 178.000000 ... 178.000000 178.000000
mean 13.000618 2.336348 ... 2.611685 746.893258
std 0.811827 1.117146 ... 0.709990 314.907474
min 11.030000 0.740000 ... 1.270000 278.000000
25% 12.362500 1.602500 ... 1.937500 500.500000
50% 13.050000 1.865000 ... 2.780000 673.500000
75% 13.677500 3.082500 ... 3.170000 985.000000
max 14.830000 5.800000 ... 4.000000 1680.000000
[8 rows x 13 columns]
First few rows of the dataset:
alcohol malic_acid ash ... hue od280/od315_of_diluted_wines proline
0 14.23 1.71 2.43 ... 1.04 3.92 1065.0
1 13.20 1.78 2.14 ... 1.05 3.40 1050.0
2 13.16 2.36 2.67 ... 1.03 3.17 1185.0
3 14.37 1.95 2.50 ... 0.86 3.45 1480.0
4 13.24 2.59 2.87 ... 1.04 2.93 735.0
[5 rows x 13 columns]
Input shape: (178, 13)
Output shape: (178,)
The steps are as follows:
Import the
load_wine
function fromsklearn.datasets
:- This function allows us to load the Wine dataset directly from the scikit-learn library.
Load the dataset using
load_wine()
:- Use
as_frame=True
to return the dataset as a pandas DataFrame for easier data manipulation and analysis.
- Use
Print the dataset shape and feature types:
- Access the shape using
dataset.data.shape
. - Show the data types of the features using
dataset.data.dtypes
.
- Access the shape using
Display summary statistics:
- Use
dataset.data.describe()
to get a statistical summary of the dataset.
- Use
Display the first few rows of the dataset:
- Print the initial rows using
dataset.data.head()
to get a sense of the dataset structure and content.
- Print the initial rows using
Split the dataset into input and output elements:
- Separate the features (
X
) from the target variable (y
). - Print the shapes of
X
andy
to confirm the split.
- Separate the features (
This example demonstrates how to quickly load and explore the Wine dataset using scikit-learn’s load_wine()
function, allowing you to inspect the data’s shape, types, summary statistics, and visualize a key feature. This sets the stage for further preprocessing and application of classification algorithms.