RandomizedSearchCV is a powerful tool for hyperparameter optimization that allows for efficient search over specified parameter distributions. It is particularly useful when the search space is large and exhaustive search methods like GridSearchCV become computationally expensive.
The key hyperparameters of RandomizedSearchCV
include n_iter
(number of parameter settings sampled), cv
(cross-validation splitting strategy), and scoring
(model evaluation metric).
This technique is appropriate for both classification and regression problems.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.svm import SVC
import numpy as np
# generate synthetic classification dataset
X, y = make_classification(n_samples=1000, n_features=4, n_classes=2, n_informative=3, n_redundant=0, random_state=42)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# define parameter distributions
param_dist = {'C': np.logspace(-3, 3, 7),
'gamma': np.logspace(-3, 3, 7),
'kernel': ['rbf', 'linear']}
# create RandomizedSearchCV object
random_search = RandomizedSearchCV(estimator=SVC(), param_distributions=param_dist, n_iter=10, cv=5, scoring='accuracy', random_state=42)
# fit the RandomizedSearchCV object
random_search.fit(X_train, y_train)
# report the best score and best parameters
print("Best score: %0.3f" % random_search.best_score_)
print("Best parameters: ", random_search.best_params_)
# make a prediction using the best model
new_data = [[1.17019871, -1.11019852, 0.79098757, -0.91008138]]
best_model = random_search.best_estimator_
prediction = best_model.predict(new_data)
print("Predicted class: ", prediction)
Running the example gives an output like:
Best score: 0.931
Best parameters: {'kernel': 'rbf', 'gamma': 1.0, 'C': 10.0}
Predicted class: [0]
The steps in this example are:
First, a synthetic binary classification dataset is generated using
make_classification()
. The dataset is then split into training and test sets.The parameter distributions to sample from are defined in a dictionary (
param_dist
). Here, we specify distributions for theC
,gamma
, andkernel
parameters of anSVC
model.A
RandomizedSearchCV
object is created, specifying theSVC
estimator, the parameter distributions, the number of iterations (n_iter
), the cross-validation strategy (cv
), and the scoring metric.The
RandomizedSearchCV
object is fit on the training data, effectively searching the parameter space according to the specified distributions.The best score and best parameters found during the search are reported using the
best_score_
andbest_params_
attributes.Finally, a prediction is made on a new data sample using the best model found (
best_estimator_
), demonstrating how the tuned model can be used for inference.
This example showcases how RandomizedSearchCV
can be used to efficiently tune the hyperparameters of a model, leading to improved performance on the task at hand.