The learning_rate_init
parameter in scikit-learn’s MLPClassifier
controls the step size at each iteration while moving toward a minimum of the loss function.
MLPClassifier
implements a multi-layer perceptron (MLP) algorithm that trains using backpropagation. The learning rate determines how quickly the model adapts to the problem, with larger values resulting in faster initial learning.
The learning_rate_init
parameter significantly affects both the speed of convergence and the quality of the final solution. A learning rate that’s too high may cause the model to converge too quickly to a suboptimal solution, while a rate that’s too low may result in slow learning or getting stuck in local minima.
The default value for learning_rate_init
is 0.001.
In practice, values between 0.0001 and 0.1 are commonly used, often adjusted based on model performance and convergence behavior.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different learning_rate_init values
learning_rates = [0.0001, 0.001, 0.01, 0.1]
accuracies = []
for lr in learning_rates:
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000,
learning_rate_init=lr, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"learning_rate_init={lr}, Accuracy: {accuracy:.3f}, Iterations: {mlp.n_iter_}")
Running the example gives an output like:
learning_rate_init=0.0001, Accuracy: 0.870, Iterations: 1000
learning_rate_init=0.001, Accuracy: 0.870, Iterations: 505
learning_rate_init=0.01, Accuracy: 0.905, Iterations: 119
learning_rate_init=0.1, Accuracy: 0.865, Iterations: 70
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with differentlearning_rate_init
values - Evaluate the accuracy of each model on the test set
- Compare both accuracy and number of iterations needed for convergence
Some tips and heuristics for setting learning_rate_init
:
- Start with the default value of 0.001 and adjust based on model performance
- If learning is slow or stuck, try increasing the learning rate
- If the model is unstable or overshooting, try decreasing the learning rate
- Consider using adaptive learning rate methods like ‘adam’ or ‘adamax’
Issues to consider:
- The optimal learning rate can vary greatly depending on the dataset and model architecture
- A learning rate that’s too high may cause the model to diverge or oscillate
- A learning rate that’s too low may result in slow convergence or getting trapped in local optima
- The learning rate often interacts with other hyperparameters, such as batch size and network architecture