The tol
parameter in scikit-learn’s SVC
class controls the tolerance for stopping criteria during optimization.
Support Vector Machines (SVMs) are powerful algorithms for classification and regression tasks. The SVC
class implements support vector classification for binary and multi-class problems.
The tol
parameter sets the tolerance for the stopping criterion. It determines the minimal change in the cost function between iterations that is required for the optimizer to continue.
Smaller values of tol
lead to tighter convergence criteria and potentially longer training times. Larger values allow for looser convergence and faster training, but may result in lower accuracy.
The default value for tol
is 1e-3, which is a good starting point for most datasets.
In practice, values between 1e-5 and 1e-2 are commonly used depending on the desired balance between training time and model accuracy.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import time
# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different tol values
tol_values = [1e-5, 1e-4, 1e-3, 1e-2]
accuracies = []
train_times = []
for tol in tol_values:
start_time = time.time()
svc = SVC(tol=tol, random_state=42)
svc.fit(X_train, y_train)
end_time = time.time()
y_pred = svc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
train_times.append(end_time - start_time)
print(f"tol={tol:.1e}, Accuracy: {accuracy:.3f}, Training time: {end_time - start_time:.2f}s")
The output will look like:
tol=1.0e-05, Accuracy: 0.944, Training time: 0.67s
tol=1.0e-04, Accuracy: 0.944, Training time: 0.65s
tol=1.0e-03, Accuracy: 0.944, Training time: 0.67s
tol=1.0e-02, Accuracy: 0.944, Training time: 1.03s
The key steps in this example are:
- Generate a synthetic binary classification dataset with informative and redundant features
- Split the data into train and test sets
- Train
SVC
models with differenttol
values - Evaluate the accuracy and training time of each model on the test set
Some tips and heuristics for setting tol
:
- Start with the default value of 1e-3 and adjust based on results
- Smaller values lead to tighter convergence but longer training times
- Larger values allow faster training but may reduce accuracy
Issues to consider:
- The optimal value of
tol
depends on the specific dataset and problem - There is a trade-off between training time and model accuracy
- Very small
tol
values can lead to much longer training times with diminishing returns in accuracy