The scoring
parameter in RandomizedSearchCV
determines the metric used to evaluate the performance of each hyperparameter combination during the search. Random search is a hyperparameter optimization method that tries random combinations of parameters to find the best performing model.
By default, scoring
uses the default metric of the estimator, which varies depending on the type of problem (e.g., accuracy for classification, R-squared for regression).
Common values for scoring
include ‘accuracy’, ‘f1’, ‘roc_auc’, ‘precision’, ‘recall’ for classification, and ‘r2’, ’neg_mean_squared_error’, ’neg_mean_absolute_error’ for regression.
As a heuristic, choose a metric that aligns with your problem’s objective. For multiclass classification, metrics like ‘f1_micro’ or ‘accuracy’ are often used, while ‘roc_auc’ is preferred for binary classification.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5, n_redundant=5, random_state=42)
# Define a parameter distribution for RandomForestClassifier hyperparameters
param_dist = {'n_estimators': [10, 50, 100],
'max_depth': [None, 5, 10],
'min_samples_split': [2, 5, 10]}
# Create a base RandomForestClassifier model
rf = RandomForestClassifier(random_state=42)
# List of scoring metrics to test
scoring_metrics = ['accuracy', 'f1_micro', 'roc_auc_ovr']
for metric in scoring_metrics:
# Run RandomizedSearchCV with the current scoring metric
search = RandomizedSearchCV(rf, param_dist, n_iter=10, cv=5, scoring=metric, random_state=42)
search.fit(X, y)
print(f"Best score for {metric}: {search.best_score_:.3f}")
print()
Running the example gives an output like:
Best score for accuracy: 0.839
Best score for f1_micro: 0.839
Best score for roc_auc_ovr: 0.949
The steps are as follows:
- Generate a synthetic multiclass classification dataset using
make_classification()
. - Define a parameter distribution dictionary
param_dist
forRandomForestClassifier
hyperparameters. - Create a base
RandomForestClassifier
modelrf
. - Iterate over different
scoring
metrics (‘accuracy’, ‘f1_micro’, ‘roc_auc_ovr’). - For each
scoring
metric:- Run
RandomizedSearchCV
with 10 iterations and 5-fold cross-validation. - Print the best score for the current
scoring
metric.
- Run