Configure SGDRegressor "verbose" Parameter

The verbose parameter in scikit-learn’s SGDRegressor controls the verbosity of output during model training.

Stochastic Gradient Descent (SGD) is an optimization algorithm used for training various linear models. It updates the model parameters iteratively based on batches of training data.

The verbose parameter determines how much information is printed during the training process. Higher values result in more detailed output, which can be useful for monitoring convergence and debugging.

The default value for verbose is 0, which means no output is produced during training. Common values are 0 (silent), 1 (progress bar), and >1 (more detailed information).

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different verbose values
verbose_values = [0, 1, 2]

for v in verbose_values:
    print(f"\nTraining with verbose={v}")
    sgd = SGDRegressor(max_iter=1000, tol=1e-3, verbose=v, random_state=42)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    print(f"Mean Squared Error: {mse:.4f}")

Running the example gives an output like:


Training with verbose=0
Mean Squared Error: 0.0096

Training with verbose=1
-- Epoch 1
Norm: 115.29, NNZs: 10, Bias: -0.147982, T: 800, Avg. loss: 1757.852454
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 129.46, NNZs: 10, Bias: 0.000573, T: 1600, Avg. loss: 68.002132
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 133.10, NNZs: 10, Bias: 0.009295, T: 2400, Avg. loss: 6.314244
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 134.24, NNZs: 10, Bias: 0.007304, T: 3200, Avg. loss: 0.807313
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 134.64, NNZs: 10, Bias: 0.003654, T: 4000, Avg. loss: 0.129453
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 134.79, NNZs: 10, Bias: -0.000699, T: 4800, Avg. loss: 0.027498
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 134.86, NNZs: 10, Bias: -0.002424, T: 5600, Avg. loss: 0.009620
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 134.88, NNZs: 10, Bias: -0.003821, T: 6400, Avg. loss: 0.005980
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 134.89, NNZs: 10, Bias: -0.004180, T: 7200, Avg. loss: 0.005159
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 134.90, NNZs: 10, Bias: -0.005264, T: 8000, Avg. loss: 0.004924
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 134.90, NNZs: 10, Bias: -0.005463, T: 8800, Avg. loss: 0.004839
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 134.90, NNZs: 10, Bias: -0.005231, T: 9600, Avg. loss: 0.004795
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 134.90, NNZs: 10, Bias: -0.006024, T: 10400, Avg. loss: 0.004787
Total training time: 0.00 seconds.
Convergence after 13 epochs took 0.00 seconds
Mean Squared Error: 0.0096

Training with verbose=2
-- Epoch 1
Norm: 115.29, NNZs: 10, Bias: -0.147982, T: 800, Avg. loss: 1757.852454
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 129.46, NNZs: 10, Bias: 0.000573, T: 1600, Avg. loss: 68.002132
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 133.10, NNZs: 10, Bias: 0.009295, T: 2400, Avg. loss: 6.314244
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 134.24, NNZs: 10, Bias: 0.007304, T: 3200, Avg. loss: 0.807313
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 134.64, NNZs: 10, Bias: 0.003654, T: 4000, Avg. loss: 0.129453
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 134.79, NNZs: 10, Bias: -0.000699, T: 4800, Avg. loss: 0.027498
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 134.86, NNZs: 10, Bias: -0.002424, T: 5600, Avg. loss: 0.009620
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 134.88, NNZs: 10, Bias: -0.003821, T: 6400, Avg. loss: 0.005980
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 134.89, NNZs: 10, Bias: -0.004180, T: 7200, Avg. loss: 0.005159
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 134.90, NNZs: 10, Bias: -0.005264, T: 8000, Avg. loss: 0.004924
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 134.90, NNZs: 10, Bias: -0.005463, T: 8800, Avg. loss: 0.004839
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 134.90, NNZs: 10, Bias: -0.005231, T: 9600, Avg. loss: 0.004795
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 134.90, NNZs: 10, Bias: -0.006024, T: 10400, Avg. loss: 0.004787
Total training time: 0.00 seconds.
Convergence after 13 epochs took 0.00 seconds
Mean Squared Error: 0.0096

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train SGDRegressor models with different verbose values
Evaluate the mean squared error of each model on the test set

Some tips for setting verbose:

Use verbose=0 for silent operation in production environments
Set verbose=1 for a progress bar during training, useful for longer runs
Use verbose>1 for detailed output, helpful when debugging or fine-tuning

Issues to consider:

Higher verbosity levels can slow down training, especially with large datasets
In production environments, consider redirecting verbose output to log files
The information provided by verbose output can help in diagnosing convergence issues

See Also