SKLearner Home | About | Contact | Examples

Scikit-Learn Get RandomizedSearchCV "best_index_" Attribute

RandomizedSearchCV is an efficient method for hyperparameter optimization that randomly samples from specified parameter distributions to find the best combination of hyperparameters for a given model.

After running a random search, you can quickly identify the index of the best performing hyperparameter configuration using the best_index_ attribute.

The best_index_ attribute is an integer that represents the index of the hyperparameter configuration that achieved the highest score during the random search cross-validation process.

Accessing best_index_ is useful when you want to retrieve the best hyperparameter values for further analysis, model training, or deployment without manually searching through the cv_results_ dictionary.

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Generate a random classification dataset
X, y = make_classification(n_samples=100, n_classes=2, n_informative=5, n_redundant=5, random_state=42)

# Set up a RandomForestClassifier
rf = RandomForestClassifier(random_state=42)

# Define hyperparameter distributions to sample from
param_dist = {
    'n_estimators': randint(5, 50),
    'max_depth': [3, 5, 10, None],
    'min_samples_split': randint(2, 10),
}

# Run random search with 5-fold cross-validation
random_search = RandomizedSearchCV(rf, param_distributions=param_dist, n_iter=10, cv=5, random_state=42)
random_search.fit(X, y)

# Access the best_index_ attribute
best_index = random_search.best_index_

# Retrieve the best hyperparameters
best_params = random_search.cv_results_['params'][best_index]

# Print the best hyperparameters
print("Best hyperparameters:")
print(best_params)

Running the example gives an output like:

Best hyperparameters:
{'max_depth': None, 'min_samples_split': 4, 'n_estimators': 26}

The example follows these steps:

  1. Generate a synthetic classification dataset using make_classification.
  2. Configure a RandomForestClassifier and define the hyperparameter distributions to sample from.
  3. Run RandomizedSearchCV with the classifier, hyperparameter distributions, 10 iterations, and 5-fold cross-validation.
  4. After fitting, access the best_index_ attribute from the random_search object.
  5. Use best_index_ to retrieve the best hyperparameters from the cv_results_ dictionary.

By leveraging the best_index_ attribute, you can efficiently access the best performing hyperparameter configuration found during the random search, allowing you to easily utilize those hyperparameters for further model development and deployment.



See Also