The activation
parameter in scikit-learn’s MLPRegressor
determines the non-linear function applied at each layer of the neural network.
Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. The choice of activation function can significantly impact the model’s performance and training dynamics.
Scikit-learn’s MLPRegressor
offers several activation functions: ‘identity’, ’logistic’, ’tanh’, and ‘relu’. Each has different properties and is suitable for different types of problems.
The default value for activation
is ‘relu’ (Rectified Linear Unit). Common choices include ‘relu’ for general-purpose use, ’tanh’ for normalized inputs, and ’logistic’ for binary classification problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different activation functions
activations = ['identity', 'logistic', 'tanh', 'relu']
mse_scores = []
for activation in activations:
mlp = MLPRegressor(hidden_layer_sizes=(100,), activation=activation,
max_iter=1000, random_state=42)
mlp.fit(X_train, y_train)
y_pred = mlp.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"Activation: {activation}, MSE: {mse:.4f}")
Running the example gives an output like:
Activation: identity, MSE: 0.0104
Activation: logistic, MSE: 1578.4868
Activation: tanh, MSE: 199.3409
Activation: relu, MSE: 30.5298
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
MLPRegressor
models with different activation functions - Evaluate the mean squared error of each model on the test set
Some tips for choosing activation functions:
- Use ‘relu’ as a default choice for most problems
- Consider ’tanh’ for normalized inputs between -1 and 1
- ’logistic’ can be useful for binary classification tasks
- ‘identity’ is rarely used but can be helpful for debugging
Issues to consider:
- Different activation functions may require different learning rates
- Some functions (like ‘relu’) can suffer from “dying neurons” problem
- The choice of activation function can affect the network’s ability to approximate certain functions
- Performance can vary depending on the specific problem and dataset characteristics