The learning_rate
parameter in scikit-learn’s AdaBoostRegressor
controls the contribution of each weak learner in the ensemble.
AdaBoost (Adaptive Boosting) is an ensemble method that combines multiple weak learners, typically decision trees, to create a strong predictor. It does this by iteratively training weak learners and adjusting the weights of misclassified instances.
The learning_rate
parameter shrinks the contribution of each weak learner. A lower learning rate requires more weak learners to achieve similar performance, but can often lead to better generalization.
The default value for learning_rate
is 1.0, which means no shrinkage is applied.
In practice, values between 0.01 and 1.0 are commonly used, with smaller values often leading to better performance at the cost of increased computational time.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different learning_rate values
learning_rates = [0.01, 0.1, 0.5, 1.0]
mse_scores = []
for lr in learning_rates:
ada = AdaBoostRegressor(learning_rate=lr, random_state=42)
ada.fit(X_train, y_train)
y_pred = ada.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"learning_rate={lr}, MSE: {mse:.3f}")
Running the example gives an output like:
learning_rate=0.01, MSE: 7445.258
learning_rate=0.1, MSE: 6247.773
learning_rate=0.5, MSE: 4073.158
learning_rate=1.0, MSE: 3767.910
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
AdaBoostRegressor
models with differentlearning_rate
values - Evaluate the mean squared error of each model on the test set
Some tips and heuristics for setting learning_rate
:
- Start with the default value of 1.0 and gradually decrease it
- Lower learning rates often lead to better generalization but require more estimators
- Use cross-validation to find the optimal learning rate for your specific dataset
Issues to consider:
- There’s a trade-off between model performance and training time
- The optimal learning rate depends on the complexity of the problem and the number of estimators
- Very low learning rates may require a large number of estimators to converge