The solver
parameter in scikit-learn’s MLPClassifier
determines the algorithm used to optimize the neural network weights during training.
MLPClassifier
is a multi-layer perceptron neural network for classification tasks. It learns non-linear decision boundaries by adjusting weights between interconnected neurons organized in layers.
The solver
parameter affects both the training speed and the quality of the final model. Different solvers are better suited for different types of problems and dataset sizes.
The default value for solver
is ‘adam’. Common alternatives include ‘sgd’ (stochastic gradient descent) and ’lbfgs’ (Limited-memory BFGS).
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
import time
# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
n_redundant=5, n_classes=3, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different solvers
solvers = ['adam', 'sgd', 'lbfgs']
results = []
for solver in solvers:
start_time = time.time()
mlp = MLPClassifier(solver=solver, random_state=42, max_iter=1000)
mlp.fit(X_train, y_train)
train_time = time.time() - start_time
y_pred = mlp.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
results.append((solver, accuracy, train_time))
print(f"Solver: {solver}, Accuracy: {accuracy:.3f}, Training Time: {train_time:.2f} seconds")
Running the example gives an output like:
Solver: adam, Accuracy: 0.933, Training Time: 11.90 seconds
Solver: sgd, Accuracy: 0.936, Training Time: 19.35 seconds
Solver: lbfgs, Accuracy: 0.923, Training Time: 9.13 seconds
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Train
MLPClassifier
models with differentsolver
options - Measure training time and evaluate accuracy for each solver
Some tips and heuristics for setting solver
:
- Use ‘adam’ for large datasets or when training with mini-batches
- Try ’lbfgs’ for smaller datasets (less than a few thousand samples)
- Use ‘sgd’ if you need fine control over learning rate schedules
Issues to consider:
- ‘adam’ and ‘sgd’ support early stopping, while ’lbfgs’ does not
- ’lbfgs’ may converge faster and perform better for small datasets
- ‘sgd’ is sensitive to feature scaling and may require more tuning