The weights
parameter in scikit-learn’s VotingRegressor
allows you to assign different importance to each base regressor in the ensemble.
VotingRegressor
combines predictions from multiple regressors to create a more robust model. The weights
parameter determines the contribution of each regressor to the final prediction.
By default, weights
is set to None
, which means all regressors have equal importance. Custom weights can be used to give more influence to better-performing or more reliable regressors.
Common configurations include equal weights (e.g., [1, 1, 1]
), normalized weights based on individual regressor performance, or weights determined through cross-validation.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, VotingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create base regressors
rf = RandomForestRegressor(n_estimators=100, random_state=42)
lr = LinearRegression()
svr = SVR(kernel='rbf')
# Create VotingRegressor instances with different weight configurations
vr_equal = VotingRegressor(estimators=[('rf', rf), ('lr', lr), ('svr', svr)])
vr_weighted = VotingRegressor(estimators=[('rf', rf), ('lr', lr), ('svr', svr)],
weights=[2, 1, 1])
# Train and evaluate models
models = [vr_equal, vr_weighted]
for model in models:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"{model.__class__.__name__} - Weights: {model.weights}, MSE: {mse:.4f}")
Running the example gives an output like:
VotingRegressor - Weights: None, MSE: 2571.5179
VotingRegressor - Weights: [2, 1, 1], MSE: 2423.3455
Key steps in this example:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Create base regressors (RandomForestRegressor, LinearRegression, SVR)
- Create VotingRegressor instances with different weight configurations
- Train the models and evaluate their performance using mean squared error
Tips and heuristics for setting weights
:
- Start with equal weights and adjust based on individual regressor performance
- Use cross-validation to determine optimal weights
- Consider the strengths and weaknesses of each base regressor when assigning weights
Issues to consider:
- Weights should be non-negative values
- The scale of weights matters (e.g., [1, 1, 2] is equivalent to [0.5, 0.5, 1])
- Overfitting can occur if weights are tuned too aggressively to the training data