Scikit-Learn Ridge Regression Model

Ridge Regression is a type of linear regression that includes a regularization term to prevent overfitting. It is suitable for regression problems where the goal is to predict a continuous target variable.

The key hyperparameters of Ridge include alpha (regularization strength), solver (optimization algorithm), and fit_intercept (whether to calculate the intercept).

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = Ridge(alpha=1.0)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[-0.26243406, 0.51541306, -1.53879325, -0.16230347, 1.01740941]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 1.127
Predicted: -13.710

The steps are as follows:

Generate a synthetic regression dataset using the make_regression() function, specifying the number of samples (n_samples), features (n_features), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and testing sets using train_test_split().
Instantiate a Ridge model with the alpha hyperparameter set to 1.0, which controls the regularization strength. The model is then fit on the training data using the fit() method.
Evaluate the model’s performance by calculating the mean squared error between the predicted values (yhat) and the actual values (y_test).
Make a prediction using the fitted model on a new data sample by passing it to the predict() method.

This example demonstrates how to implement and use Ridge Regression in scikit-learn to handle regression problems, showcasing its simplicity and effectiveness. The model helps mitigate overfitting through regularization, making it suitable for datasets with multicollinearity.

See Also