The make_friedman2
dataset is a synthetic dataset designed for regression tasks, commonly used for testing and benchmarking regression models.
Key function arguments when generating the dataset include n_samples
to specify the number of samples and noise
to add variability to the target variable.
This is a regression problem where algorithms like Linear Regression, Decision Trees, and Gradient Boosting can be applied.
from sklearn.datasets import make_friedman2
# Generate the dataset
X, y = make_friedman2(n_samples=100, noise=0.1, random_state=42)
# Display dataset shape and types
print(f"Dataset shape: {X.shape}")
print(f"Feature types: {type(X)}, {type(y)}")
# Show summary statistics of features
import pandas as pd
X_df = pd.DataFrame(X)
print(f"Summary statistics:\n{X_df.describe()}")
# Display first few rows of the features and target
print(f"First few rows of features:\n{X_df.head()}")
print(f"First few values of target:\n{y[:5]}")
Running the example gives an output like:
Dataset shape: (100, 4)
Feature types: <class 'numpy.ndarray'>, <class 'numpy.ndarray'>
Summary statistics:
0 1 2 3
count 100.000000 100.000000 100.000000 100.000000
mean 49.799188 930.185910 0.498645 5.876502
std 30.921854 472.647106 0.294562 2.856814
min 0.506158 140.688269 0.020584 1.165878
25% 25.857039 514.488715 0.270935 3.389837
50% 52.076173 971.477746 0.507239 6.056249
75% 79.596344 1285.055203 0.730203 8.353911
max 96.361998 1743.043575 0.990505 10.717821
First few rows of features:
0 1 2 3
0 37.454012 1678.777388 0.731994 6.986585
1 15.601864 380.500750 0.058084 9.661761
2 60.111501 1282.391023 0.020584 10.699099
3 83.244264 472.546861 0.181825 2.834045
4 30.424224 982.920600 0.431945 3.912291
First few values of target:
[1229.55598442 27.05490178 65.72038418 119.601185 425.68850791]
Import the
make_friedman2
function fromsklearn.datasets
:- This function generates the Friedman #2 regression problem dataset with five features.
Generate the dataset using
make_friedman2()
:- Use
n_samples=100
to generate 100 samples. - Use
noise=0.1
to add a small amount of noise to the target variable. - Set
random_state=42
for reproducibility.
- Use
Print the dataset shape and feature types:
- The dataset is returned as a tuple
(X, y)
whereX
is the feature matrix andy
is the target array.
- The dataset is returned as a tuple
Show summary statistics of the features:
- Convert
X
to a DataFrame for easier manipulation and display. - Use
X_df.describe()
to get a statistical summary of the features.
- Convert
Display the first few rows of the features and target:
- Print the initial rows using
X_df.head()
to inspect the features. - Print the first few values of
y
to inspect the target variable.
- Print the initial rows using
This example demonstrates how to generate and explore the make_friedman2
dataset using scikit-learn, providing insights into the shape, types, and summary statistics of the data, which sets the stage for applying regression algorithms.