The voting
parameter in scikit-learn’s VotingClassifier
determines how the ensemble combines predictions from its base classifiers.
VotingClassifier
is an ensemble method that combines multiple base classifiers to make predictions. The voting
parameter controls the decision-making strategy of the ensemble.
Two options are available for voting
: “hard” and “soft”. “Hard” voting uses majority rule, while “soft” voting averages predicted probabilities.
The default value for voting
is “hard”. Both “hard” and “soft” voting are commonly used, with the choice depending on the specific problem and base classifiers.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = DecisionTreeClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)
# Create VotingClassifier instances
hard_voting = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
voting='hard')
soft_voting = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
voting='soft')
# Train and evaluate models
for clf, name in [(hard_voting, 'Hard Voting'), (soft_voting, 'Soft Voting')]:
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print(f"{name}: Accuracy = {accuracy:.3f}, F1-score = {f1:.3f}")
Running the example gives an output like:
Hard Voting: Accuracy = 0.880, F1-score = 0.871
Soft Voting: Accuracy = 0.895, F1-score = 0.886
The key steps in this example are:
- Generate a synthetic classification dataset
- Split the data into train and test sets
- Create base classifiers (LogisticRegression, DecisionTreeClassifier, SVC)
- Create
VotingClassifier
instances with “hard” and “soft” voting - Train the models and evaluate their performance using accuracy and F1-score
Tips for choosing between “hard” and “soft” voting:
- Use “hard” voting when base classifiers are well-calibrated or have similar performance
- Prefer “soft” voting when base classifiers output reliable probability estimates
- “Soft” voting can provide more nuanced decisions but requires probability outputs
Considerations when using different voting methods:
- “Hard” voting is more interpretable but may lose information in close decisions
- “Soft” voting can be sensitive to poorly calibrated probability estimates
- Ensure all base classifiers support probability outputs when using “soft” voting
- The performance difference between methods can vary depending on the dataset and base classifiers