SKLearner Home | About | Contact | Examples

Configure RandomForestRegressor "bootstrap" Parameter

The bootstrap parameter in scikit-learn’s RandomForestRegressor controls whether bootstrap samples are used when building trees.

Bootstrapping is a resampling technique where each tree is trained on a random sample of the training data, drawn with replacement. This introduces randomness and diversity into the ensemble, which can help reduce overfitting.

By default, bootstrap is set to True, meaning that each tree is trained on a bootstrapped subset of the data. Setting bootstrap to False means that each tree is trained on the entire training dataset.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=5,
                       n_targets=1, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with bootstrap=True and bootstrap=False
rf_bootstrap = RandomForestRegressor(n_estimators=100, bootstrap=True, random_state=42)
rf_no_bootstrap = RandomForestRegressor(n_estimators=100, bootstrap=False, random_state=42)

rf_bootstrap.fit(X_train, y_train)
rf_no_bootstrap.fit(X_train, y_train)

# Evaluate performance
y_pred_bootstrap = rf_bootstrap.predict(X_test)
y_pred_no_bootstrap = rf_no_bootstrap.predict(X_test)

mse_bootstrap = mean_squared_error(y_test, y_pred_bootstrap)
mse_no_bootstrap = mean_squared_error(y_test, y_pred_no_bootstrap)

print(f"Bootstrap MSE: {mse_bootstrap:.3f}")
print(f"No Bootstrap MSE: {mse_no_bootstrap:.3f}")

Running the example gives an output like:

Bootstrap MSE: 208.093
No Bootstrap MSE: 458.141

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative and noise features
  2. Split the data into train and test sets
  3. Train RandomForestRegressor models with bootstrap=True and bootstrap=False
  4. Evaluate and compare the mean squared error of the models on the test set

Some tips and heuristics for setting bootstrap:

Issues to consider:



See Also