The error_score
parameter in RandomizedSearchCV
determines the value assigned to a scorer when an error occurs during scoring. Random search is a hyperparameter optimization method that tries random combinations of parameters to find the best performing model.
The default value for error_score
is np.nan
, which propagates NaN scores to the final results. Setting error_score
to "raise"
raises an exception when an error occurs, stopping the search. Alternatively, a numeric value can be set to assign a specific score when an error is encountered.
As a heuristic, use np.nan
for most cases, "raise"
for debugging, and a numeric value when a specific score is desired for errors.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.utils import shuffle
import numpy as np
from scipy.stats import randint
# Generate a synthetic classification dataset with missing values
X, y = make_classification(n_samples=100, n_features=10, random_state=42)
X[X < 0.1] = np.nan # Introduce missing values
# Shuffle the dataset
X, y = shuffle(X, y, random_state=42)
# Define a parameter distribution for RandomForestClassifier
param_dist = {'n_estimators': randint(10, 100),
'max_depth': [None, 5, 10],
'min_samples_split': randint(2, 10),
'max_features': randint(1, 10)}
# List of error_score values to test
error_scores = [np.nan, 'raise', 0]
for error_score in error_scores:
print(f"Error score: {error_score}")
try:
# Create a RandomizedSearchCV object
search = RandomizedSearchCV(estimator=RandomForestClassifier(random_state=42),
param_distributions=param_dist,
n_iter=10,
cv=5,
error_score=error_score,
random_state=42)
# Run the randomized search
search.fit(X, y)
print(f"Best score: {search.best_score_:.3f}")
except ValueError as ve:
print(f"Caught ValueError: {ve}")
print()
Running the example gives an output like:
Error score: nan
Best score: 0.960
Error score: raise
Best score: 0.960
Error score: 0
Best score: 0.960
The steps in this example are:
- Generate a synthetic classification dataset with missing values using
make_classification()
and introduce NaNs. - Shuffle the dataset to distribute the missing values randomly.
- Define a parameter distribution dictionary
param_dist
forRandomForestClassifier
hyperparameters. - Iterate over different
error_score
values (np.nan
,"raise"
, and0
). - For each
error_score
value:- Create a
RandomizedSearchCV
object with the specifiederror_score
. - Run the randomized search with
search.fit(X, y)
inside a try-except block. - If no error occurs, print the best score.
- If a
ValueError
is raised (whenerror_score="raise"
), catch and print the error message.
- Create a