Configure MLPRegressor "activation" Parameter

The activation parameter in scikit-learn’s MLPRegressor determines the non-linear function applied at each layer of the neural network.

Activation functions introduce non-linearity into the network, allowing it to learn complex patterns. The choice of activation function can significantly impact the model’s performance and training dynamics.

Scikit-learn’s MLPRegressor offers several activation functions: ‘identity’, ’logistic’, ’tanh’, and ‘relu’. Each has different properties and is suitable for different types of problems.

The default value for activation is ‘relu’ (Rectified Linear Unit). Common choices include ‘relu’ for general-purpose use, ’tanh’ for normalized inputs, and ’logistic’ for binary classification problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different activation functions
activations = ['identity', 'logistic', 'tanh', 'relu']
mse_scores = []

for activation in activations:
    mlp = MLPRegressor(hidden_layer_sizes=(100,), activation=activation,
                       max_iter=1000, random_state=42)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"Activation: {activation}, MSE: {mse:.4f}")

Running the example gives an output like:

Activation: identity, MSE: 0.0104
Activation: logistic, MSE: 1578.4868
Activation: tanh, MSE: 199.3409
Activation: relu, MSE: 30.5298

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train MLPRegressor models with different activation functions
Evaluate the mean squared error of each model on the test set

Some tips for choosing activation functions:

Use ‘relu’ as a default choice for most problems
Consider ’tanh’ for normalized inputs between -1 and 1
’logistic’ can be useful for binary classification tasks
‘identity’ is rarely used but can be helpful for debugging

Issues to consider:

Different activation functions may require different learning rates
Some functions (like ‘relu’) can suffer from “dying neurons” problem
The choice of activation function can affect the network’s ability to approximate certain functions
Performance can vary depending on the specific problem and dataset characteristics

See Also