Configure VotingClassifier "voting" Parameter

The voting parameter in scikit-learn’s VotingClassifier determines how the ensemble combines predictions from its base classifiers.

VotingClassifier is an ensemble method that combines multiple base classifiers to make predictions. The voting parameter controls the decision-making strategy of the ensemble.

Two options are available for voting: “hard” and “soft”. “Hard” voting uses majority rule, while “soft” voting averages predicted probabilities.

The default value for voting is “hard”. Both “hard” and “soft” voting are commonly used, with the choice depending on the specific problem and base classifiers.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = DecisionTreeClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Create VotingClassifier instances
hard_voting = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
                               voting='hard')
soft_voting = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
                               voting='soft')

# Train and evaluate models
for clf, name in [(hard_voting, 'Hard Voting'), (soft_voting, 'Soft Voting')]:
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print(f"{name}: Accuracy = {accuracy:.3f}, F1-score = {f1:.3f}")

Running the example gives an output like:

Hard Voting: Accuracy = 0.880, F1-score = 0.871
Soft Voting: Accuracy = 0.895, F1-score = 0.886

The key steps in this example are:

Generate a synthetic classification dataset
Split the data into train and test sets
Create base classifiers (LogisticRegression, DecisionTreeClassifier, SVC)
Create VotingClassifier instances with “hard” and “soft” voting
Train the models and evaluate their performance using accuracy and F1-score

Tips for choosing between “hard” and “soft” voting:

Use “hard” voting when base classifiers are well-calibrated or have similar performance
Prefer “soft” voting when base classifiers output reliable probability estimates
“Soft” voting can provide more nuanced decisions but requires probability outputs

Considerations when using different voting methods:

“Hard” voting is more interpretable but may lose information in close decisions
“Soft” voting can be sensitive to poorly calibrated probability estimates
Ensure all base classifiers support probability outputs when using “soft” voting
The performance difference between methods can vary depending on the dataset and base classifiers

See Also