Scikit-Learn RandomizedSearchCV Hyperparameter Optimization

Hyperparameter Optimization

RandomizedSearchCV is a powerful tool for hyperparameter optimization that allows for efficient search over specified parameter distributions. It is particularly useful when the search space is large and exhaustive search methods like GridSearchCV become computationally expensive.

The key hyperparameters of RandomizedSearchCV include n_iter (number of parameter settings sampled), cv (cross-validation splitting strategy), and scoring (model evaluation metric).

This technique is appropriate for both classification and regression problems.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.svm import SVC
import numpy as np

# generate synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, n_informative=3, n_redundant=0, random_state=42)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# define parameter distributions
param_dist = {'C': np.logspace(-3, 3, 7),
              'gamma': np.logspace(-3, 3, 7),
              'kernel': ['rbf', 'linear']}

# create RandomizedSearchCV object
random_search = RandomizedSearchCV(estimator=SVC(), param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)

# fit the RandomizedSearchCV object
random_search.fit(X_train, y_train)

# report the best score and best parameters
print("Best score: %0.3f" % random_search.best_score_)
print("Best parameters: ", random_search.best_params_)

# make a prediction using the best model
new_data = [[1.17019871, -1.11019852, 0.79098757, -0.91008138]]
best_model = random_search.best_estimator_
prediction = best_model.predict(new_data)
print("Predicted class: ", prediction)

Running the example gives an output like:

Best score: 0.931
Best parameters:  {'kernel': 'rbf', 'gamma': 1.0, 'C': 10.0}
Predicted class:  [0]

The steps in this example are:

First, a synthetic binary classification dataset is generated using make_classification(). The dataset is then split into training and test sets.
The parameter distributions to sample from are defined in a dictionary (param_dist). Here, we specify distributions for the C, gamma, and kernel parameters of an SVC model.
A RandomizedSearchCV object is created, specifying the SVC estimator, the parameter distributions, the number of iterations (n_iter), the cross-validation strategy (cv), and the scoring metric.
The RandomizedSearchCV object is fit on the training data, effectively searching the parameter space according to the specified distributions.
The best score and best parameters found during the search are reported using the best_score_ and best_params_ attributes.
Finally, a prediction is made on a new data sample using the best model found (best_estimator_), demonstrating how the tuned model can be used for inference.

This example showcases how RandomizedSearchCV can be used to efficiently tune the hyperparameters of a model, leading to improved performance on the task at hand.

See Also