The weights
parameter in scikit-learn’s VotingClassifier
allows you to assign different importance to each classifier in the ensemble.
VotingClassifier
is an ensemble method that combines predictions from multiple classifiers. It can use either hard voting (majority vote) or soft voting (weighted probabilities).
The weights
parameter determines the contribution of each classifier to the final prediction. Higher weights give more importance to a classifier’s vote.
By default, weights
is set to None
, which assigns equal importance to all classifiers. Common configurations include uniform weights [1,1,1]
or varied weights like [2,1,1]
to emphasize certain classifiers.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create base classifiers
clf1 = LogisticRegression(random_state=42)
clf2 = DecisionTreeClassifier(random_state=42)
clf3 = SVC(probability=True, random_state=42)
# Create VotingClassifier instances with different weights
weight_configs = [None, [1,1,1], [2,1,1], [1,2,1], [1,1,2]]
for weights in weight_configs:
vc = VotingClassifier(
estimators=[('lr', clf1), ('dt', clf2), ('svc', clf3)],
voting='soft',
weights=weights
)
vc.fit(X_train, y_train)
y_pred = vc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Weights: {weights}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
Weights: None, Accuracy: 0.895
Weights: [1, 1, 1], Accuracy: 0.895
Weights: [2, 1, 1], Accuracy: 0.890
Weights: [1, 2, 1], Accuracy: 0.790
Weights: [1, 1, 2], Accuracy: 0.915
The key steps in this example are:
- Generate a synthetic classification dataset
- Split the data into train and test sets
- Create base classifiers (LogisticRegression, DecisionTreeClassifier, SVC)
- Create VotingClassifier instances with different weight configurations
- Train each VotingClassifier and evaluate its accuracy on the test set
Tips and heuristics for setting weights
:
- Start with uniform weights and adjust based on individual classifier performance
- Assign higher weights to classifiers that perform better on your specific dataset
- Experiment with different weight combinations to find the optimal configuration
- Consider using cross-validation to determine the best weights
Issues to consider:
- The optimal weights depend on the relative strengths of individual classifiers
- Overly high weights on a single classifier can negate the benefits of ensemble learning
- Weights should be non-negative values
- The scale of weights matters less than their relative proportions