SKLearner Home | About | Contact | Examples

Scikit-Learn make_s_curve() Dataset

The make_s_curve function generates a 3D S-curve dataset, which is useful for testing manifold learning and visualization algorithms.

Key function arguments include n_samples to specify the number of data points and noise to add Gaussian noise to the data for more realistic scenarios.

This dataset is often used for visualization tasks and evaluating algorithms like Isomap, Locally Linear Embedding (LLE), and t-SNE.

from sklearn.datasets import make_s_curve
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate the dataset
X, y = make_s_curve(n_samples=1000, noise=0.1, random_state=42)

# Display dataset shape and types
print(f"Dataset shape: {X.shape}")
print(f"Feature types: {type(X)}, {type(y)}")

# Show summary statistics
print(f"Summary statistics:\n{X[:5]}")  # Displaying first 5 samples

# Plot the S-curve dataset
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.viridis)
plt.show()

Running the example gives an output like:

Dataset shape: (1000, 3)
Feature types: <class 'numpy.ndarray'>, <class 'numpy.ndarray'>
Summary statistics:
[[-1.01332776  0.55736238  0.6559955 ]
 [-0.97673069  1.12276331 -1.19682619]
 [ 0.79372434  1.6590624  -1.76153387]
 [ 0.83825742  1.51791268 -0.40526126]
 [ 0.19154538  1.34954752  2.05902293]]
2024-06-07 10:37:37.862 Python[94718:10800783] +[CATransaction synchronize] called within transaction

Scikit-Learn make_s_curve() Dataset

The steps are as follows:

  1. Import the make_s_curve function from sklearn.datasets and necessary plotting libraries:

    • make_s_curve generates the S-curve dataset, while matplotlib and Axes3D are used for visualization.
  2. Generate the dataset using make_s_curve():

    • Use n_samples=1000 to create a dataset with 1000 points and noise=0.1 to add some Gaussian noise.
    • Set random_state=42 for reproducibility.
  3. Print the dataset shape and types:

    • Check the shape using X.shape and print the types of X and y.
  4. Display summary statistics:

    • Show the first 5 samples of the dataset to get an idea of the data points.
  5. Plot the S-curve dataset:

    • Create a 3D scatter plot with color coding based on the target variable y.

This example demonstrates how to generate and visualize the S-curve dataset using make_s_curve() from scikit-learn. It provides a way to inspect the data’s structure and visualize it in 3D, setting the stage for further manifold learning or visualization tasks.



See Also