Configure MLPClassifier "power_t" Parameter

The power_t parameter in scikit-learn’s MLPClassifier controls the exponent for inverse scaling of the learning rate.

Multi-layer Perceptron (MLP) is a type of artificial neural network that uses backpropagation for training. The power_t parameter determines how quickly the learning rate decays during training when using the ‘invscaling’ learning rate schedule.

When using ‘invscaling’, the effective learning rate is calculated as learning_rate / (t ** power_t), where t is the current iteration. A higher value of power_t results in a faster decay of the learning rate.

The default value for power_t is 0.5. In practice, values between 0.1 and 1.0 are commonly used, depending on the specific problem and dataset characteristics.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different power_t values
power_t_values = [0.1, 0.5, 0.9]
accuracies = []

for power_t in power_t_values:
    mlp = MLPClassifier(hidden_layer_sizes=(100,), learning_rate='invscaling',
                        power_t=power_t, max_iter=1000, random_state=42)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"power_t={power_t}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

power_t=0.1, Accuracy: 0.945
power_t=0.5, Accuracy: 0.945
power_t=0.9, Accuracy: 0.945

The key steps in this example are:

Generate a synthetic classification dataset with informative and redundant features
Split the data into train and test sets
Train MLPClassifier models with different power_t values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting power_t:

Start with the default value of 0.5 and adjust based on model performance
Lower values (e.g., 0.1) result in slower learning rate decay, which may help with complex datasets
Higher values (e.g., 0.9) cause faster decay, potentially beneficial for simpler problems or when overfitting occurs

Issues to consider:

The optimal power_t value depends on the dataset complexity and the initial learning rate
Too low a value may result in slow convergence, while too high a value might cause premature convergence
The effect of power_t is closely tied to other parameters like learning_rate_init and max_iter
Always use cross-validation to find the best power_t value for your specific problem

See Also