The learning_rate_init
parameter in scikit-learn’s MLPRegressor
controls the initial learning rate for weight updates during training.
MLPRegressor
is a multi-layer perceptron regressor that uses backpropagation with gradient descent for optimization. It’s a versatile model capable of learning non-linear relationships in data.
The learning_rate_init
parameter determines the step size at the beginning of training. A larger value can lead to faster initial learning but may overshoot optimal weights, while a smaller value provides more precise updates but may result in slower convergence.
The default value for learning_rate_init
is 0.001. In practice, values between 0.0001 and 0.1 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different learning_rate_init values
learning_rates = [0.0001, 0.001, 0.01, 0.1]
mse_scores = []
for lr in learning_rates:
mlp = MLPRegressor(hidden_layer_sizes=(100,), learning_rate_init=lr, max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"learning_rate_init={lr}, MSE: {mse:.3f}")
# Find best learning rate
best_lr = learning_rates[np.argmin(mse_scores)]
print(f"Best learning_rate_init: {best_lr}")
Running the example gives an output like:
learning_rate_init=0.0001, MSE: 8040.666
learning_rate_init=0.001, MSE: 30.530
learning_rate_init=0.01, MSE: 2.109
learning_rate_init=0.1, MSE: 0.523
Best learning_rate_init: 0.1
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with differentlearning_rate_init
values - Evaluate the mean squared error of each model on the test set
- Identify the best performing learning rate
Tips and heuristics for setting learning_rate_init
:
- Start with the default value of 0.001 and adjust based on model performance
- If the loss isn’t decreasing, try a smaller learning rate
- If the loss is decreasing very slowly, try a larger learning rate
- Consider using adaptive learning rate methods like ‘adam’ or ‘adaptive’
Issues to consider:
- The optimal learning rate depends on the scale and distribution of your features
- A learning rate that’s too high can cause the model to diverge
- A learning rate that’s too low can result in slow convergence or getting stuck in local minima
- The learning rate interacts with other parameters like
max_iter
andbatch_size