The learning_rate
parameter in scikit-learn’s MLPRegressor
controls the step size at each iteration while moving toward a minimum of the loss function.
MLPRegressor
is a multi-layer perceptron regressor that optimizes the squared-loss using gradient descent. The learning_rate
parameter determines how quickly or slowly the model learns from the training data.
A higher learning rate can lead to faster convergence but may overshoot the optimal solution, while a lower learning rate may require more iterations to converge but can find a more precise solution.
The default value for learning_rate
is ‘constant’, which uses a fixed learning rate of 0.001. Other options include ‘invscaling’ and ‘adaptive’.
These refer to learning rate schedules. The actual learning rate is set via the learning_rate_init
parameter.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different learning_rate values
learning_rates = ['constant', 'invscaling', 'adaptive']
mse_scores = []
for lr in learning_rates:
mlp = MLPRegressor(hidden_layer_sizes=(100,), learning_rate=lr, max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"learning_rate={lr}, MSE: {mse:.3f}")
Running the example gives an output like:
learning_rate=constant, MSE: 30.530
learning_rate=invscaling, MSE: 30.530
learning_rate=adaptive, MSE: 30.530
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with differentlearning_rate
values - Evaluate the mean squared error of each model on the test set
Some tips and heuristics for setting learning_rate
:
- Start with the default ‘constant’ value and experiment with ‘adaptive’ for automatic adjustment
- If using a constant rate, try values on a logarithmic scale (e.g., 0.1, 0.01, 0.001)
- Monitor the loss during training to detect if the learning rate is too high (unstable loss) or too low (slow convergence)
Issues to consider:
- The optimal learning rate depends on the specific dataset and model architecture
- A learning rate that’s too high can cause the model to converge to a suboptimal solution or diverge
- A learning rate that’s too low can result in slow convergence or getting stuck in local minima
- The ‘adaptive’ option can be useful for finding a good learning rate automatically, but may not always be optimal