The species distribution dataset contains data collected for studying habitat suitability. It is commonly used for classification tasks to predict species presence based on various environmental features.
Key function arguments when loading the dataset include return_X_y
to specify if data should be returned as a tuple, and download_if_missing
to determine if the dataset should be downloaded if not available locally.
This is a classification problem where common algorithms like Logistic Regression, Support Vector Machines, and Random Forests are often applied.
from sklearn.datasets import fetch_species_distributions
import pandas as pd
# Fetch the dataset
dataset = fetch_species_distributions()
# Display dataset shape and types
print(f"Data shape: {dataset['train'].shape}, {dataset['test'].shape}")
Running the example gives an output like:
Data shape: (1624,), (620,)
The steps are as follows:
Import the
fetch_species_distributions
function fromsklearn.datasets
:- This function loads the species distribution dataset directly from the scikit-learn library.
Fetch the dataset using
fetch_species_distributions()
:- The dataset is loaded.
Print the dataset shape:
- The shape of the data is obtained using
data.shape
.
- The shape of the data is obtained using
This example shows how to load and explore the species distribution dataset using scikit-learn’s fetch_species_distributions()
function.