SKLearner Home | About | Contact | Examples

Configure BaggingClassifier "estimator" Parameter

The estimator parameter in scikit-learn’s BaggingClassifier determines the base model used in the ensemble.

BaggingClassifier is an ensemble method that creates multiple instances of a base estimator, trains each on random subsets of the data, and combines their predictions through voting. This approach helps reduce overfitting and improves generalization.

By default, BaggingClassifier uses DecisionTreeClassifier as its base estimator. However, you can specify any classifier that follows scikit-learn’s estimator API, such as LogisticRegression, SVC, or custom estimators.

The choice of base estimator can significantly impact the ensemble’s performance, bias-variance trade-off, and computational requirements.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base estimators
estimators = {
    'Default (DecisionTree)': None,
    'LogisticRegression': LogisticRegression(),
    'SVC': SVC()
}

# Train and evaluate BaggingClassifier with different base estimators
for name, estimator in estimators.items():
    bagging = BaggingClassifier(estimator=estimator, random_state=42)
    bagging.fit(X_train, y_train)
    y_pred = bagging.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print(f"{name}:")
    print(f"  Accuracy: {accuracy:.3f}")
    print(f"  F1-score: {f1:.3f}")

Running the example gives an output like:

Default (DecisionTree):
  Accuracy: 0.800
  F1-score: 0.796
LogisticRegression:
  Accuracy: 0.770
  F1-score: 0.779
SVC:
  Accuracy: 0.945
  F1-score: 0.948

Key steps in this example:

  1. Generate a synthetic classification dataset with informative features
  2. Split the data into train and test sets
  3. Create BaggingClassifier instances with different base estimators
  4. Train each ensemble and evaluate its performance on the test set
  5. Compare accuracy and F1-scores for different base estimators

Tips for choosing and configuring base estimators:

Issues to consider:



See Also