SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "scoring" Parameter

The scoring parameter in scikit-learn’s HistGradientBoostingRegressor determines the metric used for model evaluation during training and cross-validation.

HistGradientBoostingRegressor is a gradient boosting algorithm that uses histogram-based techniques for efficient training on large datasets. It builds an ensemble of weak learners (decision trees) sequentially, with each tree correcting the errors of the previous ones.

The scoring parameter affects how the model’s performance is evaluated during training and cross-validation. It does not directly impact the model’s fitting process but influences model selection and hyperparameter tuning.

By default, scoring is set to None, which uses the estimator’s score method (R-squared for regressors). Common alternatives include ’neg_mean_squared_error’, ’neg_mean_absolute_error’, and ‘r2’.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different scoring metrics
scoring_metrics = [None, 'neg_mean_squared_error', 'neg_mean_absolute_error', 'r2']
results = {}

for scoring in scoring_metrics:
    model = HistGradientBoostingRegressor(random_state=42, scoring=scoring)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    results[str(scoring)] = {
        'mse': mean_squared_error(y_test, y_pred),
        'mae': mean_absolute_error(y_test, y_pred),
        'r2': r2_score(y_test, y_pred)
    }

    print(f"Scoring: {scoring}")
    print(f"MSE: {results[str(scoring)]['mse']:.4f}")
    print(f"MAE: {results[str(scoring)]['mae']:.4f}")
    print(f"R2: {results[str(scoring)]['r2']:.4f}\n")

Running the example gives an output like:

Scoring: None
MSE: 1023.0742
MAE: 25.3547
R2: 0.9394

Scoring: neg_mean_squared_error
MSE: 1023.0742
MAE: 25.3547
R2: 0.9394

Scoring: neg_mean_absolute_error
MSE: 1023.0742
MAE: 25.3547
R2: 0.9394

Scoring: r2
MSE: 1023.0742
MAE: 25.3547
R2: 0.9394

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different scoring metrics
  4. Evaluate each model’s performance using multiple metrics
  5. Compare results to understand the impact of the scoring parameter

Some tips and heuristics for choosing the scoring parameter:

Issues to consider:



See Also