SKLearner Home | About | Contact | Examples

Configure BaggingClassifier "n_estimators" Parameter

The n_estimators parameter in scikit-learn’s BaggingClassifier controls the number of base estimators in the ensemble.

Bagging (Bootstrap Aggregating) is an ensemble method that creates multiple subsets of the original dataset, trains a classifier on each subset, and combines their predictions. The n_estimators parameter determines how many such classifiers are created and combined.

Increasing n_estimators generally improves model performance by reducing variance, but it also increases computational cost. There’s usually a point of diminishing returns where adding more estimators provides minimal benefit.

The default value for n_estimators in BaggingClassifier is 10.

In practice, values between 10 and 100 are commonly used, depending on the dataset size and complexity, as well as computational resources available.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_estimators values
n_estimators_values = [5, 10, 50, 100]
accuracies = []

for n in n_estimators_values:
    bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=n, random_state=42)
    bagging.fit(X_train, y_train)
    y_pred = bagging.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"n_estimators={n}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

n_estimators=5, Accuracy: 0.805
n_estimators=10, Accuracy: 0.865
n_estimators=50, Accuracy: 0.875
n_estimators=100, Accuracy: 0.880

The key steps in this example are:

  1. Generate a synthetic classification dataset with informative and redundant features
  2. Split the data into train and test sets
  3. Create BaggingClassifier models with different n_estimators values
  4. Train each model and evaluate its accuracy on the test set

Tips and heuristics for setting n_estimators:

Issues to consider:



See Also