The power_t
parameter in scikit-learn’s MLPRegressor
controls the decay rate of the learning rate during training when using the ‘invscaling’ learning rate schedule.
Multi-layer Perceptron (MLP) is a type of artificial neural network that learns a non-linear function approximator for regression tasks. The power_t
parameter affects how quickly the learning rate decreases over time.
When using the ‘invscaling’ learning rate, the effective learning rate at each iteration is calculated as learning_rate / (t ** power_t)
, where t
is the current iteration. A higher power_t
value results in a faster decay of the learning rate.
The default value for power_t
is 0.5. In practice, values between 0.1 and 1.0 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different power_t values
power_t_values = [0.1, 0.5, 0.9]
mse_scores = []
for power_t in power_t_values:
mlp = MLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='sgd',
learning_rate='invscaling', learning_rate_init=0.01,
power_t=power_t, max_iter=500, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"power_t={power_t}, MSE: {mse:.3f}")
# Find best power_t value
best_power_t = power_t_values[np.argmin(mse_scores)]
print(f"Best power_t value: {best_power_t}")
Running the example gives an output like:
power_t=0.1, MSE: 0.223
power_t=0.5, MSE: 6.409
power_t=0.9, MSE: 23538.663
Best power_t value: 0.1
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with differentpower_t
values - Evaluate the mean squared error of each model on the test set
- Identify the best
power_t
value based on lowest MSE
Some tips and heuristics for setting power_t
:
- Start with the default value of 0.5 and adjust based on model performance
- Lower values (e.g., 0.1) result in slower learning rate decay, which can be useful for complex problems
- Higher values (e.g., 0.9) cause faster decay, potentially beneficial for simpler problems or when overfitting occurs
Issues to consider:
- The optimal
power_t
value depends on the dataset complexity and other hyperparameters - Too low a value may result in slow convergence, while too high a value may cause premature convergence
- The effect of
power_t
interacts with other parameters likelearning_rate_init
andmax_iter
- Experiment with different values and use cross-validation to find the best
power_t
for your specific problem