Configure SGDClassifier "verbose" Parameter

The verbose parameter in scikit-learn’s SGDClassifier controls the verbosity of output during model training.

Stochastic Gradient Descent (SGD) is an optimization method used to find the parameters that minimize the loss function. The SGDClassifier applies this method to classification problems.

The verbose parameter determines how much information is printed during the training process. It can be useful for monitoring progress and debugging.

The default value for verbose is 0, which means no output is produced during training. Common values are 0 (silent), 1 (print progress), and greater than 1 (print more detailed information).

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different verbose values
verbose_values = [0, 1, 2]

for v in verbose_values:
    print(f"\nTraining with verbose={v}")
    sgd = SGDClassifier(max_iter=10, tol=1e-3, verbose=v, random_state=42)
    sgd.fit(X_train, y_train)
    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.3f}")

Running the example gives an output like:


Training with verbose=0
Accuracy: 0.740

Training with verbose=1
-- Epoch 1
Norm: 89.49, NNZs: 20, Bias: 22.738696, T: 800, Avg. loss: 13.647219
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 55.37, NNZs: 20, Bias: -7.870457, T: 1600, Avg. loss: 10.208955
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 41.46, NNZs: 20, Bias: 0.531342, T: 2400, Avg. loss: 6.728514
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 45.47, NNZs: 20, Bias: 2.992868, T: 3200, Avg. loss: 5.284373
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 36.34, NNZs: 20, Bias: 0.908330, T: 4000, Avg. loss: 4.578592
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 22.87, NNZs: 20, Bias: 2.825235, T: 4800, Avg. loss: 4.170265
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 20.50, NNZs: 20, Bias: -0.504691, T: 5600, Avg. loss: 3.403925
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 19.57, NNZs: 20, Bias: -0.363597, T: 6400, Avg. loss: 3.011103
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 18.21, NNZs: 20, Bias: -0.224667, T: 7200, Avg. loss: 2.864420
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 20.55, NNZs: 20, Bias: 0.001183, T: 8000, Avg. loss: 2.702092
Total training time: 0.00 seconds.
Accuracy: 0.740

Training with verbose=2
-- Epoch 1
Norm: 89.49, NNZs: 20, Bias: 22.738696, T: 800, Avg. loss: 13.647219
Total training time: 0.00 seconds.
-- Epoch 2
Norm: 55.37, NNZs: 20, Bias: -7.870457, T: 1600, Avg. loss: 10.208955
Total training time: 0.00 seconds.
-- Epoch 3
Norm: 41.46, NNZs: 20, Bias: 0.531342, T: 2400, Avg. loss: 6.728514
Total training time: 0.00 seconds.
-- Epoch 4
Norm: 45.47, NNZs: 20, Bias: 2.992868, T: 3200, Avg. loss: 5.284373
Total training time: 0.00 seconds.
-- Epoch 5
Norm: 36.34, NNZs: 20, Bias: 0.908330, T: 4000, Avg. loss: 4.578592
Total training time: 0.00 seconds.
-- Epoch 6
Norm: 22.87, NNZs: 20, Bias: 2.825235, T: 4800, Avg. loss: 4.170265
Total training time: 0.00 seconds.
-- Epoch 7
Norm: 20.50, NNZs: 20, Bias: -0.504691, T: 5600, Avg. loss: 3.403925
Total training time: 0.00 seconds.
-- Epoch 8
Norm: 19.57, NNZs: 20, Bias: -0.363597, T: 6400, Avg. loss: 3.011103
Total training time: 0.00 seconds.
-- Epoch 9
Norm: 18.21, NNZs: 20, Bias: -0.224667, T: 7200, Avg. loss: 2.864420
Total training time: 0.00 seconds.
-- Epoch 10
Norm: 20.55, NNZs: 20, Bias: 0.001183, T: 8000, Avg. loss: 2.702092
Total training time: 0.00 seconds.
Accuracy: 0.740

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train SGDClassifier models with different verbose values
Observe the output produced during training
Evaluate the accuracy of each model on the test set

Some tips for using the verbose parameter:

Use verbose=0 for silent operation in production environments
Set verbose=1 for basic progress information during development
Use verbose>1 for detailed debugging information

Issues to consider:

Higher verbosity levels can slow down training, especially for large datasets
In some environments, excessive output might interfere with other processes
Balance the need for information with performance requirements

See Also