Ridge regression is a linear regression technique that adds L2 regularization to ordinary least squares. The alpha
parameter in scikit-learn’s Ridge
class controls the strength of this regularization.
The alpha
parameter determines how much the model coefficients are penalized for being large. Higher values of alpha
lead to more regularization, resulting in simpler models with coefficients closer to zero. This can help prevent overfitting.
The default value for alpha
is 1.0. In practice, values between 0.1 and 10.0 are common, but the optimal value depends on the specific dataset and problem.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=10, n_informative=5,
n_targets=1, noise=0.5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different alpha values
alpha_values = [0.1, 1.0, 10.0]
mse_scores = []
for alpha in alpha_values:
ridge = Ridge(alpha=alpha)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"alpha={alpha}, MSE: {mse:.3f}")
Running the example gives an output like:
alpha=0.1, MSE: 0.417
alpha=1.0, MSE: 5.468
alpha=10.0, MSE: 339.732
The key steps in this example are:
- Generate a synthetic regression dataset with informative and noise features
- Split the data into train and test sets
- Train
Ridge
models with differentalpha
values - Evaluate the mean squared error of each model on the test set
Some tips and heuristics for setting alpha
:
- Try a range of values, for example powers of 10 from 0.0001 to 1000
- Use cross-validation to select the optimal
alpha
value - Use higher
alpha
for simpler models and loweralpha
for more complex models
Issues to consider:
- The optimal
alpha
value depends on the scale of the data - Very high
alpha
values can lead to underfitting - Very low
alpha
values can lead to overfitting