SKLearner Home | About | Contact | Examples

Scikit-Learn Configure RandomizedSearchCV "scoring" Parameter

The scoring parameter in RandomizedSearchCV determines the metric used to evaluate the performance of each hyperparameter combination during the search. Random search is a hyperparameter optimization method that tries random combinations of parameters to find the best performing model.

By default, scoring uses the default metric of the estimator, which varies depending on the type of problem (e.g., accuracy for classification, R-squared for regression).

Common values for scoring include ‘accuracy’, ‘f1’, ‘roc_auc’, ‘precision’, ‘recall’ for classification, and ‘r2’, ’neg_mean_squared_error’, ’neg_mean_absolute_error’ for regression.

As a heuristic, choose a metric that aligns with your problem’s objective. For multiclass classification, metrics like ‘f1_micro’ or ‘accuracy’ are often used, while ‘roc_auc’ is preferred for binary classification.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV

# Generate a synthetic multiclass classification dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_informative=5, n_redundant=5, random_state=42)

# Define a parameter distribution for RandomForestClassifier hyperparameters
param_dist = {'n_estimators': [10, 50, 100],
              'max_depth': [None, 5, 10],
              'min_samples_split': [2, 5, 10]}

# Create a base RandomForestClassifier model
rf = RandomForestClassifier(random_state=42)

# List of scoring metrics to test
scoring_metrics = ['accuracy', 'f1_micro', 'roc_auc_ovr']

for metric in scoring_metrics:
    # Run RandomizedSearchCV with the current scoring metric
    search = RandomizedSearchCV(rf, param_dist, n_iter=10, cv=5, scoring=metric, random_state=42)
    search.fit(X, y)

    print(f"Best score for {metric}: {search.best_score_:.3f}")
    print()

Running the example gives an output like:

Best score for accuracy: 0.839

Best score for f1_micro: 0.839

Best score for roc_auc_ovr: 0.949

The steps are as follows:

  1. Generate a synthetic multiclass classification dataset using make_classification().
  2. Define a parameter distribution dictionary param_dist for RandomForestClassifier hyperparameters.
  3. Create a base RandomForestClassifier model rf.
  4. Iterate over different scoring metrics (‘accuracy’, ‘f1_micro’, ‘roc_auc_ovr’).
  5. For each scoring metric:
    • Run RandomizedSearchCV with 10 iterations and 5-fold cross-validation.
    • Print the best score for the current scoring metric.


See Also