The alpha parameter in scikit-learn’s MLPClassifier controls the strength of L2 regularization applied to the model’s weights.
MLPClassifier is a multi-layer perceptron neural network model used for classification tasks. It learns non-linear decision boundaries by training on the input data.
The alpha parameter adds a penalty term to the loss function, discouraging large weights and helping to prevent overfitting. Larger values of alpha result in stronger regularization.
The default value for alpha is 0.0001. In practice, values are often tuned in the range of 1e-5 to 1.0, depending on the specific problem and dataset.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different alpha values
alpha_values = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1.0]
accuracies = []
for alpha in alpha_values:
mlp = MLPClassifier(hidden_layer_sizes=(100,), alpha=alpha, max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"alpha={alpha:.5f}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
alpha=0.00001, Accuracy: 0.945
alpha=0.00010, Accuracy: 0.945
alpha=0.00100, Accuracy: 0.945
alpha=0.01000, Accuracy: 0.945
alpha=0.10000, Accuracy: 0.955
alpha=1.00000, Accuracy: 0.945
The key steps in this example are:
- Generate a synthetic classification dataset with informative and redundant features
- Split the data into train and test sets
- Train
MLPClassifiermodels with differentalphavalues - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting alpha:
- Start with the default value of 0.0001 and adjust based on model performance
- Use smaller
alphavalues for complex datasets with many features - Increase
alphaif the model shows signs of overfitting (high training accuracy, low test accuracy)
Issues to consider:
- The optimal
alphavalue depends on the size and complexity of the dataset - Too small
alphavalues may lead to overfitting, while too large values can cause underfitting alphainteracts with other hyperparameters like learning rate and network architecture- Cross-validation can help find the best
alphavalue for your specific problem