SKLearner Home | About | Contact | Examples

Configure GradientBoostingRegressor "random_state" Parameter

The random_state parameter in scikit-learn’s GradientBoostingRegressor controls the random number generator used for initialization and shuffling the data.

Gradient Boosting Regressor is a powerful ensemble learning method that builds a model in a stage-wise fashion from multiple weak learners (usually decision trees) to minimize the loss function.

The random_state parameter ensures the reproducibility of the results by setting a seed for the random number generation process.

The default value for random_state is None, which means the random number generator is the RandomState instance used by np.random.

In practice, setting random_state to an integer value, such as 0 or 42, is common to make the results reproducible.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 0, 42]
mse_values = []

for state in random_state_values:
    gbr = GradientBoostingRegressor(random_state=state)
    gbr.fit(X_train, y_train)
    y_pred = gbr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_values.append(mse)
    print(f"random_state={state}, Mean Squared Error: {mse:.3f}")

Running the example gives an output like:

random_state=None, Mean Squared Error: 1234.942
random_state=0, Mean Squared Error: 1245.666
random_state=42, Mean Squared Error: 1234.753

The key steps in this example are:

  1. Generate a synthetic regression dataset with make_regression.
  2. Split the data into train and test sets.
  3. Train GradientBoostingRegressor models with different random_state values.
  4. Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting random_state:

Issues to consider:



See Also