The tol
parameter in scikit-learn’s Ridge
regressor controls the precision of the solution and serves as a stopping criterion for the solver.
Ridge regression is a regularized linear regression technique that adds an L2 penalty term to the ordinary least squares objective function. This helps to prevent overfitting and can improve generalization performance.
The tol
parameter specifies the tolerance for the optimization solver. It determines the stopping criterion based on the precision of the solution. Lower values lead to more precise solutions but require more iterations and longer training times.
The default value for tol
is 0.001.
In practice, values between 0.1 and 0.0001 are commonly used depending on the desired balance between solution quality and computational efficiency.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
n_targets=1, noise=0.5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different tol values
tol_values = [0.1, 0.01, 0.001, 0.0001]
mse_scores = []
for tol in tol_values:
ridge = Ridge(tol=tol)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"tol={tol}, MSE: {mse:.3f}")
Running the example gives an output like:
tol=0.1, MSE: 0.200
tol=0.01, MSE: 0.200
tol=0.001, MSE: 0.200
tol=0.0001, MSE: 0.200
The key steps in this example are:
- Generate a synthetic regression dataset with informative and noise features
- Split the data into train and test sets
- Train
Ridge
models with differenttol
values - Evaluate the mean squared error of each model on the test set
Some tips and heuristics for setting tol
:
- Start with the default value of 0.001 and adjust as needed based on the problem
- Use lower
tol
for high precision solutions, increase it for faster training - Find a balance between solution quality and computational cost
Issues to consider:
- Very low tolerance values can lead to long training times
- Extremely high tolerance may result in poor quality solutions
- The optimal value depends on the scale of the data and desired precision vs speed tradeoff