SKLearner Home | About | Contact | Examples

Scikit-Learn "RidgeClassifier" versus "RidgeClassifierCV"

RidgeClassifier is a classification algorithm that applies Ridge regression for binary classification tasks.

In scikit-learn, the RidgeClassifier class provides this functionality. Key hyperparameters include alpha (regularization strength), solver (optimization algorithm), and normalize (whether to normalize the data). Tuning these manually requires domain knowledge and can be time-consuming.

RidgeClassifierCV, on the other hand, extends RidgeClassifier with built-in cross-validation for automated hyperparameter tuning. Its key hyperparameters include alphas (list of alpha values to try), cv (number of folds for cross-validation), and scoring (metric to optimize).

The main difference is that RidgeClassifierCV automates the hyperparameter tuning process using cross-validation, while RidgeClassifier requires manual tuning. This automation in RidgeClassifierCV comes at a computational cost, as it trains multiple models during cross-validation.

RidgeClassifier is ideal for quick prototyping or when you have prior knowledge of good hyperparameter values. RidgeClassifierCV is preferred when you need to tune hyperparameters and perform model selection, especially with new datasets.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import RidgeClassifier, RidgeClassifierCV
from sklearn.metrics import accuracy_score, f1_score

# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit and evaluate RidgeClassifier with default hyperparameters
ridge_clf = RidgeClassifier(random_state=42)
ridge_clf.fit(X_train, y_train)
y_pred_ridge = ridge_clf.predict(X_test)
print(f"RidgeClassifier accuracy: {accuracy_score(y_test, y_pred_ridge):.3f}")
print(f"RidgeClassifier F1 score: {f1_score(y_test, y_pred_ridge):.3f}")

# Fit and evaluate RidgeClassifierCV with cross-validation
ridge_cv_clf = RidgeClassifierCV(cv=5)
ridge_cv_clf.fit(X_train, y_train)
y_pred_ridge_cv = ridge_cv_clf.predict(X_test)
print(f"\nRidgeClassifierCV accuracy: {accuracy_score(y_test, y_pred_ridge_cv):.3f}")
print(f"RidgeClassifierCV F1 score: {f1_score(y_test, y_pred_ridge_cv):.3f}")
print(f"Best hyperparameters: {ridge_cv_clf.alpha_}")

Running the example gives an output like:

RidgeClassifier accuracy: 0.855
RidgeClassifier F1 score: 0.854

RidgeClassifierCV accuracy: 0.855
RidgeClassifierCV F1 score: 0.854
Best hyperparameters: 10.0

The steps are as follows:

  1. Generate a synthetic binary classification dataset using make_classification.
  2. Split the data into training and test sets using train_test_split.
  3. Instantiate RidgeClassifier with default hyperparameters, fit it on the training data, and evaluate its performance on the test set.
  4. Instantiate RidgeClassifierCV with 5-fold cross-validation, fit it on the training data, and evaluate its performance on the test set.
  5. Compare the test set performance (accuracy and F1 score) of both models and print the best hyperparameters found by RidgeClassifierCV.


See Also