Configure MLPRegressor "hidden_layer_sizes" Parameter

The hidden_layer_sizes parameter in scikit-learn’s MLPRegressor defines the architecture of the neural network by specifying the number of hidden layers and the number of neurons in each layer.

Multi-layer Perceptron (MLP) is a feedforward neural network model that maps input data to a set of outputs. The hidden_layer_sizes parameter determines the network’s capacity and ability to model complex relationships in the data.

This parameter accepts a tuple where each element represents the number of neurons in a hidden layer. The length of the tuple defines the number of hidden layers in the network.

The default value for hidden_layer_sizes is (100,), which creates a single hidden layer with 100 neurons.

Common configurations include (50,50) for two hidden layers with 50 neurons each, or (100,50,25) for three hidden layers with decreasing neuron counts.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different hidden_layer_sizes
layer_sizes = [(100,), (50,50), (100,50,25)]
mse_scores = []

for layers in layer_sizes:
    mlp = MLPRegressor(hidden_layer_sizes=layers, max_iter=1000, random_state=42)
    mlp.fit(X_train, y_train)
    y_pred = mlp.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"hidden_layer_sizes={layers}, MSE: {mse:.3f}")

# Find best configuration
best_config = layer_sizes[np.argmin(mse_scores)]
print(f"Best configuration: {best_config}")

Running the example gives an output like:

hidden_layer_sizes=(100,), MSE: 30.530
hidden_layer_sizes=(50, 50), MSE: 11.140
hidden_layer_sizes=(100, 50, 25), MSE: 5.085
Best configuration: (100, 50, 25)

The key steps in this example are:

Generate a synthetic regression dataset
Split the data into train and test sets
Train MLPRegressor models with different hidden_layer_sizes configurations
Evaluate the mean squared error (MSE) of each model on the test set
Identify the best performing configuration

Some tips and heuristics for setting hidden_layer_sizes:

Start with a simple architecture and gradually increase complexity
Consider using a pyramid structure with decreasing neuron counts in deeper layers
Experiment with both shallow (1-2 layers) and deep (3+ layers) architectures
Use cross-validation to find the optimal configuration for your specific dataset

Issues to consider:

Larger networks have more capacity but are prone to overfitting and longer training times
The optimal architecture depends on the complexity of the underlying data relationships
Too few neurons may lead to underfitting, while too many can cause overfitting
Consider using regularization techniques (e.g., alpha parameter) with larger networks

See Also