SKLearner Home | About | Contact | Examples

Configure HistGradientBoostingClassifier "random_state" Parameter

The random_state parameter in scikit-learn’s HistGradientBoostingClassifier controls the randomness of the model’s training process.

Histogram-based gradient boosting is an efficient implementation of gradient boosting that uses binning to speed up the training process. It builds an ensemble of decision trees sequentially, with each tree correcting the errors of the previous ones.

The random_state parameter ensures reproducibility of the model’s results. When set to a specific integer, it guarantees that the same sequence of random numbers is generated, leading to consistent model behavior across different runs.

By default, random_state is set to None, which means that the random number generator is the RandomState instance used by numpy.random. Common practice is to set it to a fixed integer for reproducibility.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train models with different random_state values
random_states = [None, 42, 100, 200]
for rs in random_states:
    model = HistGradientBoostingClassifier(random_state=rs, max_iter=100)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"random_state={rs}, Accuracy: {accuracy:.4f}")

# Train multiple models with the same random_state
print("\nTraining multiple models with random_state=42:")
for _ in range(3):
    model = HistGradientBoostingClassifier(random_state=42, max_iter=100)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Accuracy: {accuracy:.4f}")

Running this example produces output similar to:

random_state=None, Accuracy: 0.9150
random_state=42, Accuracy: 0.9150
random_state=100, Accuracy: 0.9150
random_state=200, Accuracy: 0.9150

Training multiple models with random_state=42:
Accuracy: 0.9150
Accuracy: 0.9150
Accuracy: 0.9150

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Train HistGradientBoostingClassifier models with different random_state values
  4. Evaluate the accuracy of each model on the test set
  5. Demonstrate reproducibility by training multiple models with the same random_state

Tips for using random_state:

Considerations:



See Also