The verbose
parameter in scikit-learn’s SGDRegressor
controls the verbosity of output during model training.
Stochastic Gradient Descent (SGD) is an optimization algorithm used for training various linear models. It updates the model parameters iteratively based on batches of training data.
The verbose
parameter determines how much information is printed during the training process. Higher values result in more detailed output, which can be useful for monitoring convergence and debugging.
The default value for verbose
is 0, which means no output is produced during training. Common values are 0 (silent), 1 (progress bar), and >1 (more detailed information).
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different verbose values
verbose_values = [0, 1, 2]
for v in verbose_values:
print(f"\nTraining with verbose={v}")
sgd = SGDRegressor(max_iter=1000, tol=1e-3, verbose=v, random_state=42)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")
Running the example gives an output like:
Training with verbose=0
Mean Squared Error: 0.0096
Training with verbose=1
-- Epoch 1
Norm: 115.29, NNZs: 10, Bias: -0.147982, T: 800, Avg. loss: 1757.852454
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 129.46, NNZs: 10, Bias: 0.000573, T: 1600, Avg. loss: 68.002132
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 133.10, NNZs: 10, Bias: 0.009295, T: 2400, Avg. loss: 6.314244
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 134.24, NNZs: 10, Bias: 0.007304, T: 3200, Avg. loss: 0.807313
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 134.64, NNZs: 10, Bias: 0.003654, T: 4000, Avg. loss: 0.129453
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 134.79, NNZs: 10, Bias: -0.000699, T: 4800, Avg. loss: 0.027498
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 134.86, NNZs: 10, Bias: -0.002424, T: 5600, Avg. loss: 0.009620
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 134.88, NNZs: 10, Bias: -0.003821, T: 6400, Avg. loss: 0.005980
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 134.89, NNZs: 10, Bias: -0.004180, T: 7200, Avg. loss: 0.005159
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 134.90, NNZs: 10, Bias: -0.005264, T: 8000, Avg. loss: 0.004924
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 134.90, NNZs: 10, Bias: -0.005463, T: 8800, Avg. loss: 0.004839
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 134.90, NNZs: 10, Bias: -0.005231, T: 9600, Avg. loss: 0.004795
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 134.90, NNZs: 10, Bias: -0.006024, T: 10400, Avg. loss: 0.004787
Total training time: 0.00 seconds.
Convergence after 13 epochs took 0.00 seconds
Mean Squared Error: 0.0096
Training with verbose=2
-- Epoch 1
Norm: 115.29, NNZs: 10, Bias: -0.147982, T: 800, Avg. loss: 1757.852454
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 129.46, NNZs: 10, Bias: 0.000573, T: 1600, Avg. loss: 68.002132
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 133.10, NNZs: 10, Bias: 0.009295, T: 2400, Avg. loss: 6.314244
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 134.24, NNZs: 10, Bias: 0.007304, T: 3200, Avg. loss: 0.807313
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 134.64, NNZs: 10, Bias: 0.003654, T: 4000, Avg. loss: 0.129453
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 134.79, NNZs: 10, Bias: -0.000699, T: 4800, Avg. loss: 0.027498
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 134.86, NNZs: 10, Bias: -0.002424, T: 5600, Avg. loss: 0.009620
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 134.88, NNZs: 10, Bias: -0.003821, T: 6400, Avg. loss: 0.005980
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 134.89, NNZs: 10, Bias: -0.004180, T: 7200, Avg. loss: 0.005159
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 134.90, NNZs: 10, Bias: -0.005264, T: 8000, Avg. loss: 0.004924
Total training time: 0.00 seconds.
-- Epoch 11
Norm: 134.90, NNZs: 10, Bias: -0.005463, T: 8800, Avg. loss: 0.004839
Total training time: 0.00 seconds.
-- Epoch 12
Norm: 134.90, NNZs: 10, Bias: -0.005231, T: 9600, Avg. loss: 0.004795
Total training time: 0.00 seconds.
-- Epoch 13
Norm: 134.90, NNZs: 10, Bias: -0.006024, T: 10400, Avg. loss: 0.004787
Total training time: 0.00 seconds.
Convergence after 13 epochs took 0.00 seconds
Mean Squared Error: 0.0096
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
SGDRegressor
models with differentverbose
values - Evaluate the mean squared error of each model on the test set
Some tips for setting verbose
:
- Use
verbose=0
for silent operation in production environments - Set
verbose=1
for a progress bar during training, useful for longer runs - Use
verbose>1
for detailed output, helpful when debugging or fine-tuning
Issues to consider:
- Higher verbosity levels can slow down training, especially with large datasets
- In production environments, consider redirecting verbose output to log files
- The information provided by verbose output can help in diagnosing convergence issues