Configure MLPRegressor "power_t" Parameter

The power_t parameter in scikit-learn’s MLPRegressor controls the decay rate of the learning rate during training when using the ‘invscaling’ learning rate schedule.

Multi-layer Perceptron (MLP) is a type of artificial neural network that learns a non-linear function approximator for regression tasks. The power_t parameter affects how quickly the learning rate decreases over time.

When using the ‘invscaling’ learning rate, the effective learning rate at each iteration is calculated as learning_rate / (t ** power_t), where t is the current iteration. A higher power_t value results in a faster decay of the learning rate.

The default value for power_t is 0.5. In practice, values between 0.1 and 1.0 are commonly used, depending on the specific problem and dataset characteristics.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different power_t values
power_t_values = [0.1, 0.5, 0.9]
mse_scores = []

for power_t in power_t_values:
    mlp = MLPRegressor(hidden_layer_sizes=(100,), activation='relu', solver='sgd',
                       learning_rate='invscaling', learning_rate_init=0.01,
                       power_t=power_t, max_iter=500, random_state=42)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"power_t={power_t}, MSE: {mse:.3f}")

# Find best power_t value
best_power_t = power_t_values[np.argmin(mse_scores)]
print(f"Best power_t value: {best_power_t}")

Running the example gives an output like:

power_t=0.1, MSE: 0.223
power_t=0.5, MSE: 6.409
power_t=0.9, MSE: 23538.663
Best power_t value: 0.1

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train MLPRegressor models with different power_t values
Evaluate the mean squared error of each model on the test set
Identify the best power_t value based on lowest MSE

Some tips and heuristics for setting power_t:

Start with the default value of 0.5 and adjust based on model performance
Lower values (e.g., 0.1) result in slower learning rate decay, which can be useful for complex problems
Higher values (e.g., 0.9) cause faster decay, potentially beneficial for simpler problems or when overfitting occurs

Issues to consider:

The optimal power_t value depends on the dataset complexity and other hyperparameters
Too low a value may result in slow convergence, while too high a value may cause premature convergence
The effect of power_t interacts with other parameters like learning_rate_init and max_iter
Experiment with different values and use cross-validation to find the best power_t for your specific problem

See Also