RidgeClassifier is a classification algorithm that applies Ridge regression for binary classification tasks.
In scikit-learn, the RidgeClassifier
class provides this functionality. Key hyperparameters include alpha
(regularization strength), solver
(optimization algorithm), and normalize
(whether to normalize the data). Tuning these manually requires domain knowledge and can be time-consuming.
RidgeClassifierCV
, on the other hand, extends RidgeClassifier
with built-in cross-validation for automated hyperparameter tuning. Its key hyperparameters include alphas
(list of alpha
values to try), cv
(number of folds for cross-validation), and scoring
(metric to optimize).
The main difference is that RidgeClassifierCV
automates the hyperparameter tuning process using cross-validation, while RidgeClassifier
requires manual tuning. This automation in RidgeClassifierCV
comes at a computational cost, as it trains multiple models during cross-validation.
RidgeClassifier
is ideal for quick prototyping or when you have prior knowledge of good hyperparameter values. RidgeClassifierCV
is preferred when you need to tune hyperparameters and perform model selection, especially with new datasets.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeClassifier, RidgeClassifierCV
from sklearn.metrics import accuracy_score, f1_score
# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit and evaluate RidgeClassifier with default hyperparameters
ridge_clf = RidgeClassifier(random_state=42)
ridge_clf.fit(X_train, y_train)
y_pred_ridge = ridge_clf.predict(X_test)
print(f"RidgeClassifier accuracy: {accuracy_score(y_test, y_pred_ridge):.3f}")
print(f"RidgeClassifier F1 score: {f1_score(y_test, y_pred_ridge):.3f}")
# Fit and evaluate RidgeClassifierCV with cross-validation
ridge_cv_clf = RidgeClassifierCV(cv=5)
ridge_cv_clf.fit(X_train, y_train)
y_pred_ridge_cv = ridge_cv_clf.predict(X_test)
print(f"\nRidgeClassifierCV accuracy: {accuracy_score(y_test, y_pred_ridge_cv):.3f}")
print(f"RidgeClassifierCV F1 score: {f1_score(y_test, y_pred_ridge_cv):.3f}")
print(f"Best hyperparameters: {ridge_cv_clf.alpha_}")
Running the example gives an output like:
RidgeClassifier accuracy: 0.855
RidgeClassifier F1 score: 0.854
RidgeClassifierCV accuracy: 0.855
RidgeClassifierCV F1 score: 0.854
Best hyperparameters: 10.0
The steps are as follows:
- Generate a synthetic binary classification dataset using
make_classification
. - Split the data into training and test sets using
train_test_split
. - Instantiate
RidgeClassifier
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Instantiate
RidgeClassifierCV
with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set. - Compare the test set performance (accuracy and F1 score) of both models and print the best hyperparameters found by
RidgeClassifierCV
.