SKLearner Home | About | Contact | Examples

Configure BaggingRegressor "max_features" Parameter

The max_features parameter in scikit-learn’s BaggingRegressor controls the number of features randomly selected for each base estimator.

BaggingRegressor is an ensemble method that fits multiple base regressors on random subsets of the original dataset and aggregates their predictions. The max_features parameter determines the size of the feature subset for each base estimator.

Setting max_features can help balance the trade-off between model diversity and individual estimator performance. Lower values increase diversity but may reduce the performance of individual estimators, while higher values do the opposite.

The default value for max_features is 1.0, which means all features are used. Common values range from 0.5 to 1.0, depending on the dataset’s characteristics and the desired balance between diversity and individual estimator strength.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different max_features values
max_features_values = [0.5, 0.7, 0.9, 1.0]
mse_scores = []

for max_feat in max_features_values:
    bgr = BaggingRegressor(max_features=max_feat, random_state=42)
    bgr.fit(X_train, y_train)
    y_pred = bgr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"max_features={max_feat}, MSE: {mse:.3f}")

# Find best max_features value
best_max_features = max_features_values[np.argmin(mse_scores)]
print(f"Best max_features value: {best_max_features}")

Running the example gives an output like:

max_features=0.5, MSE: 13034.131
max_features=0.7, MSE: 9486.080
max_features=0.9, MSE: 7813.246
max_features=1.0, MSE: 7486.481
Best max_features value: 1.0

The key steps in this example are:

  1. Generate a synthetic regression dataset with 20 features
  2. Split the data into train and test sets
  3. Train BaggingRegressor models with different max_features values
  4. Evaluate the mean squared error of each model on the test set
  5. Identify the best max_features value based on lowest MSE

Some tips and heuristics for setting max_features:

Issues to consider:



See Also