SKLearner Home | About | Contact | Examples

Configure VotingClassifier "estimators" Parameter

The estimators parameter in scikit-learn’s VotingClassifier defines the set of classifiers to be used in the ensemble.

VotingClassifier is an ensemble method that combines predictions from multiple base classifiers to make a final prediction. The estimators parameter is a list of tuples, where each tuple contains a string (the estimator name) and an estimator object.

This parameter allows you to specify which classifiers to include in the ensemble and how to identify them. The flexibility of this parameter enables you to combine diverse algorithms to potentially improve overall prediction accuracy.

The default value for estimators is an empty list []. In practice, you typically include 3-5 different classifiers, such as logistic regression, random forests, and support vector machines.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = RandomForestClassifier(n_estimators=100, random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Create VotingClassifiers with different estimator combinations
estimator_combinations = [
    [('lr', clf1), ('rf', clf2)],
    [('lr', clf1), ('svm', clf3)],
    [('rf', clf2), ('svm', clf3)],
    [('lr', clf1), ('rf', clf2), ('svm', clf3)]
]

for i, estimators in enumerate(estimator_combinations, 1):
    vc = VotingClassifier(estimators=estimators, voting='soft')
    vc.fit(X_train, y_train)
    y_pred = vc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Combination {i}: Estimators={[e[0] for e in estimators]}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

Combination 1: Estimators=['lr', 'rf'], Accuracy: 0.885
Combination 2: Estimators=['lr', 'svm'], Accuracy: 0.860
Combination 3: Estimators=['rf', 'svm'], Accuracy: 0.875
Combination 4: Estimators=['lr', 'rf', 'svm'], Accuracy: 0.880

The key steps in this example are:

  1. Generate a synthetic binary classification dataset
  2. Split the data into train and test sets
  3. Define individual classifiers (LogisticRegression, RandomForestClassifier, SVC)
  4. Create VotingClassifier instances with different estimator combinations
  5. Train each VotingClassifier and evaluate its accuracy on the test set

Tips for configuring the estimators parameter:

Issues to consider:



See Also