SKLearner Home | About | Contact | Examples

Scikit-Learn load_wine() Dataset

The Wine dataset is commonly used for classification tasks to predict the type of wine based on various chemical properties.

Key function arguments when loading the dataset include return_X_y to specify if data should be returned as a tuple and as_frame to get the data as a pandas DataFrame.

This is a multiclass classification problem where algorithms like Logistic Regression, Support Vector Machines, and Random Forests are often applied.

from sklearn.datasets import load_wine

# Load the dataset
dataset = load_wine(as_frame=True)

# Display dataset shape and types
print(f"Dataset shape: {dataset.data.shape}")
print(f"Feature types:\n{dataset.data.dtypes}")

# Show summary statistics
print(f"Summary statistics:\n{dataset.data.describe()}")

# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{dataset.data.head()}")

# Split the dataset into input and output elements
X = dataset.data
y = dataset.target
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")

Running the example gives an output like:

Dataset shape: (178, 13)
Feature types:
alcohol                         float64
malic_acid                      float64
ash                             float64
alcalinity_of_ash               float64
magnesium                       float64
total_phenols                   float64
flavanoids                      float64
nonflavanoid_phenols            float64
proanthocyanins                 float64
color_intensity                 float64
hue                             float64
od280/od315_of_diluted_wines    float64
proline                         float64
dtype: object
Summary statistics:
          alcohol  malic_acid  ...  od280/od315_of_diluted_wines      proline
count  178.000000  178.000000  ...                    178.000000   178.000000
mean    13.000618    2.336348  ...                      2.611685   746.893258
std      0.811827    1.117146  ...                      0.709990   314.907474
min     11.030000    0.740000  ...                      1.270000   278.000000
25%     12.362500    1.602500  ...                      1.937500   500.500000
50%     13.050000    1.865000  ...                      2.780000   673.500000
75%     13.677500    3.082500  ...                      3.170000   985.000000
max     14.830000    5.800000  ...                      4.000000  1680.000000

[8 rows x 13 columns]
First few rows of the dataset:
   alcohol  malic_acid   ash  ...   hue  od280/od315_of_diluted_wines  proline
0    14.23        1.71  2.43  ...  1.04                          3.92   1065.0
1    13.20        1.78  2.14  ...  1.05                          3.40   1050.0
2    13.16        2.36  2.67  ...  1.03                          3.17   1185.0
3    14.37        1.95  2.50  ...  0.86                          3.45   1480.0
4    13.24        2.59  2.87  ...  1.04                          2.93    735.0

[5 rows x 13 columns]
Input shape: (178, 13)
Output shape: (178,)

The steps are as follows:

  1. Import the load_wine function from sklearn.datasets:

    • This function allows us to load the Wine dataset directly from the scikit-learn library.
  2. Load the dataset using load_wine():

    • Use as_frame=True to return the dataset as a pandas DataFrame for easier data manipulation and analysis.
  3. Print the dataset shape and feature types:

    • Access the shape using dataset.data.shape.
    • Show the data types of the features using dataset.data.dtypes.
  4. Display summary statistics:

    • Use dataset.data.describe() to get a statistical summary of the dataset.
  5. Display the first few rows of the dataset:

    • Print the initial rows using dataset.data.head() to get a sense of the dataset structure and content.
  6. Split the dataset into input and output elements:

    • Separate the features (X) from the target variable (y).
    • Print the shapes of X and y to confirm the split.

This example demonstrates how to quickly load and explore the Wine dataset using scikit-learn’s load_wine() function, allowing you to inspect the data’s shape, types, summary statistics, and visualize a key feature. This sets the stage for further preprocessing and application of classification algorithms.



See Also