Scikit-Learn SGDRegressor Model

SGDRegressor is a linear model that fits to the data using stochastic gradient descent, useful for large-scale learning tasks.

Key hyperparameters include loss (loss function), penalty (regularization term), and max_iter (maximum number of iterations).

The algorithm is appropriate for regression problems where the goal is to predict a continuous value.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = SGDRegressor()

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[-0.30620401, 1.44184433, -0.83017134, -0.8805776, 0.85794562]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 0.019
Predicted: 36.025

The steps are as follows:

Generate a synthetic regression dataset using make_regression() with specified samples (n_samples), features (n_features), and noise (noise), ensuring reproducibility with a fixed random seed (random_state).
Split the dataset into training and test sets with train_test_split().
Instantiate the SGDRegressor model with default hyperparameters.
Fit the model on the training data using fit().
Evaluate the model’s performance by predicting on the test set and calculating the mean squared error using mean_squared_error().
Make a single prediction using the predict() method with a new data sample.

This example demonstrates how to set up and use an SGDRegressor model for regression tasks, highlighting the ease and efficiency of this algorithm in scikit-learn.

The model can be fit directly on the training data and used to make predictions on new data, enabling its application in real-world regression problems.

See Also