SKLearner Home | About | Contact | Examples

Configure VotingClassifier "voting" Parameter

The voting parameter in scikit-learn’s VotingClassifier determines how the ensemble combines predictions from its base classifiers.

VotingClassifier is an ensemble method that combines multiple base classifiers to make predictions. The voting parameter controls the decision-making strategy of the ensemble.

Two options are available for voting: “hard” and “soft”. “Hard” voting uses majority rule, while “soft” voting averages predicted probabilities.

The default value for voting is “hard”. Both “hard” and “soft” voting are commonly used, with the choice depending on the specific problem and base classifiers.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = DecisionTreeClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)

# Create VotingClassifier instances
hard_voting = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
                               voting='hard')
soft_voting = VotingClassifier(estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
                               voting='soft')

# Train and evaluate models
for clf, name in [(hard_voting, 'Hard Voting'), (soft_voting, 'Soft Voting')]:
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    f1 = f1_score(y_test, y_pred)
    print(f"{name}: Accuracy = {accuracy:.3f}, F1-score = {f1:.3f}")

Running the example gives an output like:

Hard Voting: Accuracy = 0.880, F1-score = 0.871
Soft Voting: Accuracy = 0.895, F1-score = 0.886

The key steps in this example are:

  1. Generate a synthetic classification dataset
  2. Split the data into train and test sets
  3. Create base classifiers (LogisticRegression, DecisionTreeClassifier, SVC)
  4. Create VotingClassifier instances with “hard” and “soft” voting
  5. Train the models and evaluate their performance using accuracy and F1-score

Tips for choosing between “hard” and “soft” voting:

Considerations when using different voting methods:



See Also