SKLearner Home | About | Contact | Examples

Scikit-Learn Configure RandomizedSearchCV "error_score" Parameter

The error_score parameter in RandomizedSearchCV determines the value assigned to a scorer when an error occurs during scoring. Random search is a hyperparameter optimization method that tries random combinations of parameters to find the best performing model.

The default value for error_score is np.nan, which propagates NaN scores to the final results. Setting error_score to "raise" raises an exception when an error occurs, stopping the search. Alternatively, a numeric value can be set to assign a specific score when an error is encountered.

As a heuristic, use np.nan for most cases, "raise" for debugging, and a numeric value when a specific score is desired for errors.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.utils import shuffle
import numpy as np
from scipy.stats import randint

# Generate a synthetic classification dataset with missing values
X, y = make_classification(n_samples=100, n_features=10, random_state=42)
X[X < 0.1] = np.nan  # Introduce missing values

# Shuffle the dataset
X, y = shuffle(X, y, random_state=42)

# Define a parameter distribution for RandomForestClassifier
param_dist = {'n_estimators': randint(10, 100),
              'max_depth': [None, 5, 10],
              'min_samples_split': randint(2, 10),
              'max_features': randint(1, 10)}

# List of error_score values to test
error_scores = [np.nan, 'raise', 0]

for error_score in error_scores:
    print(f"Error score: {error_score}")

    try:
        # Create a RandomizedSearchCV object
        search = RandomizedSearchCV(estimator=RandomForestClassifier(random_state=42),
                                    param_distributions=param_dist,
                                    n_iter=10,
                                    cv=5,
                                    error_score=error_score,
                                    random_state=42)

        # Run the randomized search
        search.fit(X, y)

        print(f"Best score: {search.best_score_:.3f}")
    except ValueError as ve:
        print(f"Caught ValueError: {ve}")

    print()

Running the example gives an output like:

Error score: nan
Best score: 0.960

Error score: raise
Best score: 0.960

Error score: 0
Best score: 0.960

The steps in this example are:

  1. Generate a synthetic classification dataset with missing values using make_classification() and introduce NaNs.
  2. Shuffle the dataset to distribute the missing values randomly.
  3. Define a parameter distribution dictionary param_dist for RandomForestClassifier hyperparameters.
  4. Iterate over different error_score values (np.nan, "raise", and 0).
  5. For each error_score value:
    • Create a RandomizedSearchCV object with the specified error_score.
    • Run the randomized search with search.fit(X, y) inside a try-except block.
    • If no error occurs, print the best score.
    • If a ValueError is raised (when error_score="raise"), catch and print the error message.


See Also