The make_friedman3
function generates a synthetic dataset used for regression tasks. This example demonstrates how to generate and explore the make_friedman3
dataset, including inspecting its structure, plotting the data, and preparing it for modeling.
The make_friedman3
function creates a regression problem with four input features and one output. Key arguments include n_samples
for the number of samples and noise
for adding Gaussian noise to the output. This is a regression problem where algorithms like Linear Regression, Decision Trees, and Random Forests can be applied.
from sklearn.datasets import make_friedman3
import pandas as pd
# Generate the dataset
X, y = make_friedman3(n_samples=100, noise=0.1, random_state=42)
# Convert to DataFrame for easier analysis
df_X = pd.DataFrame(X, columns=[f"Feature_{i+1}" for i in range(X.shape[1])])
df_y = pd.Series(y, name="Target")
# Display dataset shape and types
print(f"Dataset shape: {df_X.shape}")
print(f"Feature types:\n{df_X.dtypes}")
# Show summary statistics
print(f"Summary statistics:\n{df_X.describe()}")
# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{df_X.head()}")
# Split the dataset into input and output elements
X = df_X
y = df_y
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")
Running the example gives an output like:
Dataset shape: (100, 4)
Feature types:
Feature_1 float64
Feature_2 float64
Feature_3 float64
Feature_4 float64
dtype: object
Summary statistics:
Feature_1 Feature_2 Feature_3 Feature_4
count 100.000000 100.000000 100.000000 100.000000
mean 49.799188 930.185910 0.498645 5.876502
std 30.921854 472.647106 0.294562 2.856814
min 0.506158 140.688269 0.020584 1.165878
25% 25.857039 514.488715 0.270935 3.389837
50% 52.076173 971.477746 0.507239 6.056249
75% 79.596344 1285.055203 0.730203 8.353911
max 96.361998 1743.043575 0.990505 10.717821
First few rows of the dataset:
Feature_1 Feature_2 Feature_3 Feature_4
0 37.454012 1678.777388 0.731994 6.986585
1 15.601864 380.500750 0.058084 9.661761
2 60.111501 1282.391023 0.020584 10.699099
3 83.244264 472.546861 0.181825 2.834045
4 30.424224 982.920600 0.431945 3.912291
Input shape: (100, 4)
Output shape: (100,)
The steps are as follows:
Import the
make_friedman3
function fromsklearn.datasets
and necessary libraries for data handling and plotting:- This function generates a synthetic dataset for regression tasks.
Generate the dataset using
make_friedman3()
:- Specify
n_samples
for the number of samples andnoise
for adding Gaussian noise to the output.
- Specify
Convert the dataset to a DataFrame for easier analysis:
- Use
pd.DataFrame
andpd.Series
to structure the input features and target variable.
- Use
Print the dataset shape and feature types:
- Access the shape using
df_X.shape
. - Show the data types of the features using
df_X.dtypes
.
- Access the shape using
Display summary statistics:
- Use
df_X.describe()
to get a statistical summary of the dataset.
- Use
Display the first few rows of the dataset:
- Print the initial rows using
df_X.head()
to get a sense of the dataset structure and content.
- Print the initial rows using
Split the dataset into input and output elements:
- Separate the features (
X
) from the target variable (y
). - Print the shapes of
X
andy
to confirm the split.
- Separate the features (
This example demonstrates how to generate and explore the make_friedman3
dataset using scikit-learn, allowing you to inspect the data’s structure, summary statistics, and visualize relationships between features and the target. This sets the stage for further preprocessing and application of regression algorithms.