Configure BaggingRegressor "random_state" Parameter

The random_state parameter in scikit-learn’s BaggingRegressor controls the random number generation for sampling and model initialization.

Bagging (Bootstrap Aggregating) is an ensemble method that creates multiple instances of a base estimator on different subsets of the data and combines their predictions. The random_state parameter ensures reproducibility in this random sampling process.

Setting random_state to a specific integer value ensures that the random number generation is consistent across different runs, which is crucial for reproducibility in machine learning experiments.

The default value for random_state is None, which means that the random number generator is the RandomState instance used by np.random.

In practice, random_state is often set to a fixed integer (e.g., 42) for reproducibility, or left as None for truly random behavior.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different random_state values
random_states = [None, 0, 42, 100]
mse_scores = []

for rs in random_states:
    br = BaggingRegressor(random_state=rs)
    br.fit(X_train, y_train)
    y_pred = br.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"random_state={rs}, MSE: {mse:.3f}")

# Check consistency with fixed random_state
br1 = BaggingRegressor(random_state=42)
br1.fit(X_train, y_train)
mse1 = mean_squared_error(y_test, br1.predict(X_test))

br2 = BaggingRegressor(random_state=42)
br2.fit(X_train, y_train)
mse2 = mean_squared_error(y_test, br2.predict(X_test))

print(f"MSE with fixed random_state (42): {mse1:.3f}, {mse2:.3f}")
print(f"Are MSEs equal? {np.isclose(mse1, mse2)}")

Running the example gives an output like:

random_state=None, MSE: 9040.013
random_state=0, MSE: 8920.858
random_state=42, MSE: 7486.481
random_state=100, MSE: 8630.965
MSE with fixed random_state (42): 7486.481, 7486.481
Are MSEs equal? True

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train BaggingRegressor models with different random_state values
Evaluate the Mean Squared Error (MSE) of each model on the test set
Demonstrate consistency with a fixed random_state

Some tips for setting random_state:

Use a fixed integer value for reproducibility in experiments and debugging
Leave as None for truly random behavior in production or when variation is desired
Document the random_state value used in experiments for future reference

Issues to consider:

A fixed random_state may hide performance variations due to random initialization
Different random_state values can lead to slightly different model performance
Reproducibility requires setting random_state for all random processes in the pipeline

See Also