The weights parameter in scikit-learn’s VotingRegressor allows you to assign different importance to each base regressor in the ensemble.
VotingRegressor combines predictions from multiple regressors to create a more robust model. The weights parameter determines the contribution of each regressor to the final prediction.
By default, weights is set to None, which means all regressors have equal importance. Custom weights can be used to give more influence to better-performing or more reliable regressors.
Common configurations include equal weights (e.g., [1, 1, 1]), normalized weights based on individual regressor performance, or weights determined through cross-validation.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, VotingRegressor
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create base regressors
rf = RandomForestRegressor(n_estimators=100, random_state=42)
lr = LinearRegression()
svr = SVR(kernel='rbf')
# Create VotingRegressor instances with different weight configurations
vr_equal = VotingRegressor(estimators=[('rf', rf), ('lr', lr), ('svr', svr)])
vr_weighted = VotingRegressor(estimators=[('rf', rf), ('lr', lr), ('svr', svr)],
weights=[2, 1, 1])
# Train and evaluate models
models = [vr_equal, vr_weighted]
for model in models:
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"{model.__class__.__name__} - Weights: {model.weights}, MSE: {mse:.4f}")
Running the example gives an output like:
VotingRegressor - Weights: None, MSE: 2571.5179
VotingRegressor - Weights: [2, 1, 1], MSE: 2423.3455
Key steps in this example:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Create base regressors (RandomForestRegressor, LinearRegression, SVR)
- Create VotingRegressor instances with different weight configurations
- Train the models and evaluate their performance using mean squared error
Tips and heuristics for setting weights:
- Start with equal weights and adjust based on individual regressor performance
- Use cross-validation to determine optimal weights
- Consider the strengths and weaknesses of each base regressor when assigning weights
Issues to consider:
- Weights should be non-negative values
- The scale of weights matters (e.g., [1, 1, 2] is equivalent to [0.5, 0.5, 1])
- Overfitting can occur if weights are tuned too aggressively to the training data