SKLearner Home | About | Contact | Examples

Configure GradientBoostingClassifier "random_state" Parameter

The random_state parameter in scikit-learn’s GradientBoostingClassifier is used to control the randomness of the model for reproducibility purposes.

Gradient Boosting is an ensemble method that sequentially trains weak learners (decision trees) to correct the mistakes of the previous learners. The random_state parameter sets the seed for the random number generator used in the model’s random operations, such as subsampling of the training data and feature sampling.

Setting random_state to a fixed value ensures that the same sequence of random operations is used each time the model is trained, leading to identical results across runs. However, it does not affect the model’s performance, only the specific sequence of random operations.

The default value for random_state is None, meaning the randomness is not controlled and results may vary slightly across runs.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_state_values = [None, 42, 123]

for rs in random_state_values:
    gb = GradientBoostingClassifier(random_state=rs)
    gb.fit(X_train, y_train)
    y_pred = gb.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"random_state={rs}, Accuracy: {accuracy:.3f}")
    print(f"First few predictions: {y_pred[:5]}\n")

Running the example gives an output like:

random_state=None, Accuracy: 0.910
First few predictions: [1 1 0 1 1]

random_state=42, Accuracy: 0.915
First few predictions: [1 1 0 1 1]

random_state=123, Accuracy: 0.910
First few predictions: [1 1 0 1 1]

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train GradientBoostingClassifier models with different random_state values
  4. Evaluate the accuracy of each model on the test set and compare the first few predictions

Some tips and heuristics for setting random_state:

Issues to consider:



See Also