SKLearner Home | About | Contact | Examples

Configure Ridge "tol" Parameter

The tol parameter in scikit-learn’s Ridge regressor controls the precision of the solution and serves as a stopping criterion for the solver.

Ridge regression is a regularized linear regression technique that adds an L2 penalty term to the ordinary least squares objective function. This helps to prevent overfitting and can improve generalization performance.

The tol parameter specifies the tolerance for the optimization solver. It determines the stopping criterion based on the precision of the solution. Lower values lead to more precise solutions but require more iterations and longer training times.

The default value for tol is 0.001.

In practice, values between 0.1 and 0.0001 are commonly used depending on the desired balance between solution quality and computational efficiency.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
                       n_targets=1, noise=0.5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different tol values
tol_values = [0.1, 0.01, 0.001, 0.0001]
mse_scores = []

for tol in tol_values:
    ridge = Ridge(tol=tol)
    ridge.fit(X_train, y_train)
    y_pred = ridge.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"tol={tol}, MSE: {mse:.3f}")

Running the example gives an output like:

tol=0.1, MSE: 0.200
tol=0.01, MSE: 0.200
tol=0.001, MSE: 0.200
tol=0.0001, MSE: 0.200

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train Ridge models with different tol values
  4. Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting tol:

Issues to consider:



See Also