SKLearner Home | About | Contact | Examples

Scikit-Learn make_friedman3() Dataset

The make_friedman3 function generates a synthetic dataset used for regression tasks. This example demonstrates how to generate and explore the make_friedman3 dataset, including inspecting its structure, plotting the data, and preparing it for modeling.

The make_friedman3 function creates a regression problem with four input features and one output. Key arguments include n_samples for the number of samples and noise for adding Gaussian noise to the output. This is a regression problem where algorithms like Linear Regression, Decision Trees, and Random Forests can be applied.

from sklearn.datasets import make_friedman3
import pandas as pd

# Generate the dataset
X, y = make_friedman3(n_samples=100, noise=0.1, random_state=42)

# Convert to DataFrame for easier analysis
df_X = pd.DataFrame(X, columns=[f"Feature_{i+1}" for i in range(X.shape[1])])
df_y = pd.Series(y, name="Target")

# Display dataset shape and types
print(f"Dataset shape: {df_X.shape}")
print(f"Feature types:\n{df_X.dtypes}")

# Show summary statistics
print(f"Summary statistics:\n{df_X.describe()}")

# Display first few rows of the dataset
print(f"First few rows of the dataset:\n{df_X.head()}")

# Split the dataset into input and output elements
X = df_X
y = df_y
print(f"Input shape: {X.shape}")
print(f"Output shape: {y.shape}")

Running the example gives an output like:

Dataset shape: (100, 4)
Feature types:
Feature_1    float64
Feature_2    float64
Feature_3    float64
Feature_4    float64
dtype: object
Summary statistics:
        Feature_1    Feature_2   Feature_3   Feature_4
count  100.000000   100.000000  100.000000  100.000000
mean    49.799188   930.185910    0.498645    5.876502
std     30.921854   472.647106    0.294562    2.856814
min      0.506158   140.688269    0.020584    1.165878
25%     25.857039   514.488715    0.270935    3.389837
50%     52.076173   971.477746    0.507239    6.056249
75%     79.596344  1285.055203    0.730203    8.353911
max     96.361998  1743.043575    0.990505   10.717821
First few rows of the dataset:
   Feature_1    Feature_2  Feature_3  Feature_4
0  37.454012  1678.777388   0.731994   6.986585
1  15.601864   380.500750   0.058084   9.661761
2  60.111501  1282.391023   0.020584  10.699099
3  83.244264   472.546861   0.181825   2.834045
4  30.424224   982.920600   0.431945   3.912291
Input shape: (100, 4)
Output shape: (100,)

The steps are as follows:

  1. Import the make_friedman3 function from sklearn.datasets and necessary libraries for data handling and plotting:

    • This function generates a synthetic dataset for regression tasks.
  2. Generate the dataset using make_friedman3():

    • Specify n_samples for the number of samples and noise for adding Gaussian noise to the output.
  3. Convert the dataset to a DataFrame for easier analysis:

    • Use pd.DataFrame and pd.Series to structure the input features and target variable.
  4. Print the dataset shape and feature types:

    • Access the shape using df_X.shape.
    • Show the data types of the features using df_X.dtypes.
  5. Display summary statistics:

    • Use df_X.describe() to get a statistical summary of the dataset.
  6. Display the first few rows of the dataset:

    • Print the initial rows using df_X.head() to get a sense of the dataset structure and content.
  7. Split the dataset into input and output elements:

    • Separate the features (X) from the target variable (y).
    • Print the shapes of X and y to confirm the split.

This example demonstrates how to generate and explore the make_friedman3 dataset using scikit-learn, allowing you to inspect the data’s structure, summary statistics, and visualize relationships between features and the target. This sets the stage for further preprocessing and application of regression algorithms.



See Also