SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingRegressor "random_state" Parameter

The random_state parameter in scikit-learn’s HistGradientBoostingRegressor controls the randomness of the model, ensuring reproducibility of results.

HistGradientBoostingRegressor is a gradient boosting algorithm that uses histogram-based decision trees. It’s designed for efficiency and performance, particularly on large datasets.

The random_state parameter sets the seed for the random number generator used in the model. This affects the randomness in various parts of the algorithm, such as subsampling of the dataset and feature selection for splitting nodes.

By default, random_state is set to None, which means the model will use a different random seed each time it’s run. Setting it to an integer value ensures consistent results across multiple runs.

Common values for random_state are typically arbitrary integers, such as 42, 0, or 123. The specific value doesn’t matter as long as it’s consistent across runs for reproducibility.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 0, 42, 123]
mse_scores = []

for rs in random_state_values:
    model = HistGradientBoostingRegressor(random_state=rs)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"random_state={rs}, MSE: {mse:.4f}")

# Train multiple times with random_state=None
mse_scores_none = []
for _ in range(3):
    model = HistGradientBoostingRegressor(random_state=None)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores_none.append(mse)
    print(f"random_state=None (repeated), MSE: {mse:.4f}")

Running the example gives an output like:

random_state=None, MSE: 3073.5886
random_state=0, MSE: 3073.5886
random_state=42, MSE: 3073.5886
random_state=123, MSE: 3073.5886
random_state=None (repeated), MSE: 3073.5886
random_state=None (repeated), MSE: 3073.5886
random_state=None (repeated), MSE: 3073.5886

The key steps in this example are:

  1. Generate a synthetic regression dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingRegressor models with different random_state values
  4. Evaluate the mean squared error (MSE) of each model on the test set
  5. Demonstrate the variability of results when random_state=None

Some tips and heuristics for setting random_state:

Issues to consider:



See Also