The tol
parameter in scikit-learn’s SGDRegressor
controls the stopping criterion for training based on the improvement in the loss function.
Stochastic Gradient Descent (SGD) is an iterative optimization algorithm used for training linear models. It updates the model parameters using one sample at a time, making it efficient for large datasets.
The tol
parameter determines when to stop training if the improvement in the loss is below this threshold. A smaller tol
value leads to more iterations and potentially better convergence, while a larger value may stop training earlier.
The default value for tol
is 1e-3 (0.001). In practice, values between 1e-5 and 1e-2 are commonly used, depending on the desired trade-off between convergence and computational time.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different tol values
tol_values = [1e-5, 1e-4, 1e-3, 1e-2]
mse_scores = []
for tol in tol_values:
sgd = SGDRegressor(tol=tol, random_state=42, max_iter=1000)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"tol={tol:.1e}, MSE: {mse:.3f}, n_iter_: {sgd.n_iter_}")
Running the example gives an output like:
tol=1.0e-05, MSE: 0.010, n_iter_: 20
tol=1.0e-04, MSE: 0.010, n_iter_: 15
tol=1.0e-03, MSE: 0.010, n_iter_: 13
tol=1.0e-02, MSE: 0.010, n_iter_: 12
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
SGDRegressor
models with differenttol
values - Evaluate the mean squared error of each model on the test set
- Compare the number of iterations required for convergence
Some tips and heuristics for setting tol
:
- Start with the default value of 1e-3 and adjust based on model performance
- Use smaller
tol
values for more precise convergence, but be aware of increased computation time - Consider using early stopping with a validation set to prevent overfitting
Issues to consider:
- The optimal
tol
value depends on the scale and complexity of your data - Very small
tol
values may lead to overfitting or unnecessarily long training times - Large
tol
values might cause premature stopping, resulting in suboptimal solutions