Scikit-Learn Configure GridSearchCV "return_train_score" Parameter

The return_train_score parameter in scikit-learn’s GridSearchCV controls whether training scores are computed and returned in the cv_results_ attribute. By default, this parameter is set to False, meaning only validation scores are reported for each hyperparameter combination.

Setting return_train_score to True will compute training scores in addition to validation scores. This can provide valuable insights into whether a model is overfitting (high training scores but low validation scores) or underfitting (low scores on both sets).

from sklearn.datasets import make_regression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge

# generate a synthetic regression dataset
X, y = make_regression(n_samples=100, n_features=10, noise=0.1, random_state=42)

# create a pipeline
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge())
])

# define the parameter grid
param_grid = {'ridge__alpha': [0.1, 1.0, 10.0]}

# run GridSearchCV with default 'return_train_score' (False)
grid_search_default = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5)
grid_search_default.fit(X, y)

# run GridSearchCV with 'return_train_score' set to True
grid_search_train = GridSearchCV(estimator=pipeline, param_grid=param_grid, cv=5, return_train_score=True)
grid_search_train.fit(X, y)

# print the cv_results_ without train scores
print("CV results without train scores:")
print(grid_search_default.cv_results_)

# print the cv_results_ with train scores
print("CV results with train scores:")
print(grid_search_train.cv_results_)

Running the example gives an output like:

CV results without train scores:
{'mean_fit_time': array([0.00139108, 0.00100942, 0.00122395]), 'std_fit_time': array([2.89526146e-04, 5.24390811e-06, 2.30003954e-04]), 'mean_score_time': array([0.00055017, 0.00052509, 0.00051861]), 'std_score_time': array([2.32443320e-05, 4.34611893e-05, 2.53865524e-05]), 'param_ridge__alpha': masked_array(data=[0.1, 1.0, 10.0],
             mask=[False, False, False],
       fill_value=1e+20), 'params': [{'ridge__alpha': 0.1}, {'ridge__alpha': 1.0}, {'ridge__alpha': 10.0}], 'split0_test_score': array([0.99999748, 0.99976147, 0.9818852 ]), 'split1_test_score': array([0.99999799, 0.99983417, 0.98733181]), 'split2_test_score': array([0.99999749, 0.99986857, 0.99000378]), 'split3_test_score': array([0.99999874, 0.99986289, 0.98883627]), 'split4_test_score': array([0.99999648, 0.99974793, 0.98175641]), 'mean_test_score': array([0.99999764, 0.99981501, 0.9859627 ]), 'std_test_score': array([7.38456956e-07, 5.07835855e-05, 3.48657640e-03]), 'rank_test_score': array([1, 2, 3], dtype=int32)}
CV results with train scores:
{'mean_fit_time': array([0.00124483, 0.00109639, 0.00112467]), 'std_fit_time': array([4.01812467e-04, 7.52905488e-05, 2.40980140e-04]), 'mean_score_time': array([0.00057836, 0.0006794 , 0.00050402]), 'std_score_time': array([1.30767442e-04, 2.25248183e-04, 1.24772896e-05]), 'param_ridge__alpha': masked_array(data=[0.1, 1.0, 10.0],
             mask=[False, False, False],
       fill_value=1e+20), 'params': [{'ridge__alpha': 0.1}, {'ridge__alpha': 1.0}, {'ridge__alpha': 10.0}], 'split0_test_score': array([0.99999748, 0.99976147, 0.9818852 ]), 'split1_test_score': array([0.99999799, 0.99983417, 0.98733181]), 'split2_test_score': array([0.99999749, 0.99986857, 0.99000378]), 'split3_test_score': array([0.99999874, 0.99986289, 0.98883627]), 'split4_test_score': array([0.99999648, 0.99974793, 0.98175641]), 'mean_test_score': array([0.99999764, 0.99981501, 0.9859627 ]), 'std_test_score': array([7.38456956e-07, 5.07835855e-05, 3.48657640e-03]), 'rank_test_score': array([1, 2, 3], dtype=int32), 'split0_train_score': array([0.9999978 , 0.99981222, 0.98578377]), 'split1_train_score': array([0.99999828, 0.99985773, 0.98888046]), 'split2_train_score': array([0.99999855, 0.99987773, 0.9902186 ]), 'split3_train_score': array([0.99999836, 0.99986652, 0.9893865 ]), 'split4_train_score': array([0.99999859, 0.99988565, 0.99102202]), 'mean_train_score': array([0.99999832, 0.99985997, 0.98905827]), 'std_train_score': array([2.84199683e-07, 2.57041481e-05, 1.79245133e-03])}

The key steps in this example are:

Generate a synthetic regression dataset using make_regression
Create a Pipeline with StandardScaler and Ridge steps
Define a parameter grid for the alpha parameter of Ridge
Run GridSearchCV with default return_train_score (False) and print the cv_results_ attribute
Run GridSearchCV with return_train_score set to True and print the cv_results_ attribute
Compare the two cv_results_ outputs, noting the additional training score columns (mean_train_score, std_train_score, split0_train_score, etc.) when return_train_score is True

This example demonstrates how to use the return_train_score parameter in GridSearchCV to include training scores in the results, enabling evaluation of overfitting or underfitting in the model.

See Also