The tol
parameter in scikit-learn’s SGDClassifier
controls the stopping criterion for training based on the improvement in loss.
Stochastic Gradient Descent (SGD) is an optimization algorithm used to find the parameters that minimize the loss function. The tol
parameter determines how small the improvement in loss must be to consider the model converged.
A smaller tol
value results in more iterations and potentially better model performance, but increases training time. A larger tol
value may lead to earlier stopping and faster training, but potentially suboptimal performance.
The default value for tol
is 1e-3 (0.001). In practice, values between 1e-5 and 1e-2 are commonly used, depending on the desired trade-off between training time and model performance.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different tol values
tol_values = [1e-5, 1e-4, 1e-3, 1e-2]
accuracies = []
for tol in tol_values:
sgd = SGDClassifier(tol=tol, random_state=42, max_iter=1000)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"tol={tol:.1e}, Accuracy: {accuracy:.3f}, Iterations: {sgd.n_iter_}")
Running the example gives an output like:
tol=1.0e-05, Accuracy: 0.770, Iterations: 67
tol=1.0e-04, Accuracy: 0.770, Iterations: 67
tol=1.0e-03, Accuracy: 0.770, Iterations: 67
tol=1.0e-02, Accuracy: 0.770, Iterations: 67
The key steps in this example are:
- Generate a synthetic binary classification dataset with informative and redundant features
- Split the data into train and test sets
- Train
SGDClassifier
models with differenttol
values - Evaluate the accuracy of each model on the test set
- Report the number of iterations required for convergence
Some tips and heuristics for setting tol
:
- Start with the default value of 1e-3 and adjust based on model performance and training time
- Use a smaller
tol
for complex datasets or when high precision is required - Consider using a larger
tol
for simpler datasets or when faster training is prioritized
Issues to consider:
- A too small
tol
may lead to overfitting and unnecessarily long training times - A too large
tol
may cause premature stopping and underfitting - The optimal
tol
value depends on the specific dataset and problem complexity - Monitor both model performance and training time when tuning
tol