The activation
parameter in scikit-learn’s MLPClassifier
determines the activation function for the hidden layers of the neural network.
Activation functions introduce non-linearity to the model, allowing it to learn complex patterns in the data. The choice of activation function can significantly impact the model’s performance and training dynamics.
Scikit-learn’s MLPClassifier
supports four activation functions: ‘identity’, ’logistic’, ’tanh’, and ‘relu’. Each has different properties and is suitable for different types of problems.
The default value for activation
is ‘relu’ (Rectified Linear Unit). In practice, ‘relu’ and ’tanh’ are commonly used for most problems, while ’logistic’ is sometimes used for binary classification tasks.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different activation functions
activation_functions = ['identity', 'logistic', 'tanh', 'relu']
accuracies = []
for activation in activation_functions:
mlp = MLPClassifier(hidden_layer_sizes=(100, 50), activation=activation,
max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"Activation: {activation}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
Activation: identity, Accuracy: 0.670
Activation: logistic, Accuracy: 0.870
Activation: tanh, Accuracy: 0.860
Activation: relu, Accuracy: 0.895
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with different activation functions - Evaluate the accuracy of each model on the test set
Some tips for choosing activation functions:
- ReLU is often a good default choice for hidden layers
- Tanh can work well for shallow networks or as the output layer for regression tasks
- Logistic (sigmoid) is suitable for binary classification output layers
- Identity can be used for linear activation, but limits the network’s capacity
Issues to consider:
- ReLU can suffer from “dying ReLU” problem where neurons become inactive
- Tanh and logistic functions can suffer from vanishing gradients in deep networks
- The choice of activation function can affect the initialization of weights and learning rate
- Different activation functions may require different scaling of input data