SKLearner Home | About | Contact | Examples

Scikit-Learn VotingClassifier Model

A VotingClassifier is an ensemble machine learning model that combines the predictions from multiple other models (base estimators) to improve predictive performance.

It is useful for both classification and regression tasks.

The key hyperparameters include estimators (list of base models), voting (type of voting strategy: ‘hard’ or ‘soft’), and weights (optional, weights assigned to the base models).

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# generate a synthetic binary classification dataset
X, y = make_classification(n_samples=100, n_features=5, n_classes=2, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# define the base models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = SVC(probability=True)

# create the voting ensemble
ensemble = VotingClassifier(estimators=[('lr', model1), ('dt', model2), ('svc', model3)], voting='soft')

# fit the ensemble model
ensemble.fit(X_train, y_train)

# evaluate the model
yhat = ensemble.predict(X_test)
acc = accuracy_score(y_test, yhat)
print('Accuracy: %.3f' % acc)

# make a prediction
row = [[-1.10325445, -0.49821356, -0.05962247, -0.89224592, -0.70158632]]
yhat = ensemble.predict(row)
print('Predicted: %d' % yhat[0])

Running the example gives an output like:

Accuracy: 0.950
Predicted: 0

The steps are as follows:

  1. First, a synthetic binary classification dataset is generated using the make_classification() function. This creates a dataset with a specified number of samples (n_samples), classes (n_classes), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and test sets using train_test_split().

  2. Next, three base models are defined: LogisticRegression, DecisionTreeClassifier, and SVC with probability estimation enabled.

  3. A VotingClassifier ensemble is created using the base models with ‘soft’ voting. The model is then fit on the training data using the fit() method.

  4. The performance of the model is evaluated by comparing the predictions (yhat) to the actual values (y_test) using the accuracy score metric.

  5. A single prediction can be made by passing a new data sample to the predict() method.

This example demonstrates how to use the VotingClassifier to create an ensemble model that leverages the strengths of multiple individual models, improving overall classification performance. The ensemble approach can be applied to various real-world classification problems, providing a robust solution for predictive modeling tasks.



See Also