Ridge Regression is a type of linear regression that includes a regularization term to prevent overfitting. It is suitable for regression problems where the goal is to predict a continuous target variable.
The key hyperparameters of Ridge
include alpha
(regularization strength), solver
(optimization algorithm), and fit_intercept
(whether to calculate the intercept).
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = Ridge(alpha=1.0)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)
# make a prediction
row = [[-0.26243406, 0.51541306, -1.53879325, -0.16230347, 1.01740941]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
Mean Squared Error: 1.127
Predicted: -13.710
The steps are as follows:
Generate a synthetic regression dataset using the
make_regression()
function, specifying the number of samples (n_samples
), features (n_features
), and a fixed random seed (random_state
) for reproducibility. The dataset is split into training and testing sets usingtrain_test_split()
.Instantiate a
Ridge
model with thealpha
hyperparameter set to 1.0, which controls the regularization strength. The model is then fit on the training data using thefit()
method.Evaluate the model’s performance by calculating the mean squared error between the predicted values (
yhat
) and the actual values (y_test
).Make a prediction using the fitted model on a new data sample by passing it to the
predict()
method.
This example demonstrates how to implement and use Ridge Regression in scikit-learn to handle regression problems, showcasing its simplicity and effectiveness. The model helps mitigate overfitting through regularization, making it suitable for datasets with multicollinearity.