The epsilon
parameter in scikit-learn’s SGDClassifier
is used in the learning rate schedule for the ‘invscaling’ learning rate option.
Stochastic Gradient Descent (SGD) is an optimization algorithm that updates model parameters iteratively using a subset of training data. It’s particularly useful for large-scale and sparse machine learning problems.
The epsilon
parameter adds a small constant to the denominator of the ‘invscaling’ learning rate schedule. This helps to prevent the learning rate from becoming too small, especially during later iterations.
The default value for epsilon
is 0.1. In practice, values between 1e-8 and 0.1 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import numpy as np
# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
n_redundant=5, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different epsilon values
epsilon_values = [1e-8, 1e-4, 1e-2, 1e-1]
accuracies = []
for epsilon in epsilon_values:
sgd = SGDClassifier(loss='log_loss', learning_rate='invscaling', eta0=0.1,
epsilon=epsilon, random_state=42, max_iter=1000)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"epsilon={epsilon:.1e}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
epsilon=1.0e-08, Accuracy: 0.822
epsilon=1.0e-04, Accuracy: 0.822
epsilon=1.0e-02, Accuracy: 0.822
epsilon=1.0e-01, Accuracy: 0.822
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train
SGDClassifier
models with differentepsilon
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting epsilon
:
- Start with the default value of 0.1 and adjust based on model performance
- Smaller values of
epsilon
allow for more fine-grained learning rate decay - Larger values can help prevent the learning rate from becoming too small
Issues to consider:
- The optimal
epsilon
value depends on the dataset and problem complexity - Too small
epsilon
might lead to premature decay of the learning rate - Too large
epsilon
might result in a learning rate that doesn’t decay enough epsilon
interacts with other learning rate parameters likeeta0
andpower_t