The power_t
parameter in scikit-learn’s MLPClassifier
controls the exponent for inverse scaling of the learning rate.
Multi-layer Perceptron (MLP) is a type of artificial neural network that uses backpropagation for training. The power_t
parameter determines how quickly the learning rate decays during training when using the ‘invscaling’ learning rate schedule.
When using ‘invscaling’, the effective learning rate is calculated as learning_rate / (t ** power_t)
, where t is the current iteration. A higher value of power_t
results in a faster decay of the learning rate.
The default value for power_t
is 0.5. In practice, values between 0.1 and 1.0 are commonly used, depending on the specific problem and dataset characteristics.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different power_t values
power_t_values = [0.1, 0.5, 0.9]
accuracies = []
for power_t in power_t_values:
mlp = MLPClassifier(hidden_layer_sizes=(100,), learning_rate='invscaling',
power_t=power_t, max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"power_t={power_t}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
power_t=0.1, Accuracy: 0.945
power_t=0.5, Accuracy: 0.945
power_t=0.9, Accuracy: 0.945
The key steps in this example are:
- Generate a synthetic classification dataset with informative and redundant features
- Split the data into train and test sets
- Train
MLPClassifier
models with differentpower_t
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting power_t
:
- Start with the default value of 0.5 and adjust based on model performance
- Lower values (e.g., 0.1) result in slower learning rate decay, which may help with complex datasets
- Higher values (e.g., 0.9) cause faster decay, potentially beneficial for simpler problems or when overfitting occurs
Issues to consider:
- The optimal
power_t
value depends on the dataset complexity and the initial learning rate - Too low a value may result in slow convergence, while too high a value might cause premature convergence
- The effect of
power_t
is closely tied to other parameters likelearning_rate_init
andmax_iter
- Always use cross-validation to find the best
power_t
value for your specific problem