The penalty
parameter in scikit-learn’s SGDRegressor
controls the type of regularization applied to the model during training.
Stochastic Gradient Descent (SGD) is an optimization method used for fitting linear models. The SGDRegressor
implements regularized linear regression models trained using SGD.
The penalty
parameter determines the type of regularization used to prevent overfitting. It can be set to ’l2’ (L2 regularization), ’l1’ (L1 regularization), ’elasticnet’ (combination of L1 and L2), or None (no regularization).
The default value for penalty
is ’l2’. Common choices include ’l2’ for general-purpose regularization, ’l1’ for feature selection, and ’elasticnet’ for a balance between the two.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different penalty values
penalties = ['l2', 'l1', 'elasticnet', None]
mse_scores = []
for penalty in penalties:
sgd = SGDRegressor(penalty=penalty, random_state=42)
sgd.fit(X_train, y_train)
y_pred = sgd.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"penalty={penalty}, Mean Squared Error: {mse:.3f}")
# Find best penalty
best_penalty = penalties[np.argmin(mse_scores)]
print(f"Best penalty: {best_penalty}")
Running the example gives an output like:
penalty=l2, Mean Squared Error: 0.012
penalty=l1, Mean Squared Error: 0.011
penalty=elasticnet, Mean Squared Error: 0.012
penalty=None, Mean Squared Error: 0.011
Best penalty: l1
The key steps in this example are:
- Generate a synthetic regression dataset
- Split the data into train and test sets
- Train
SGDRegressor
models with differentpenalty
values - Evaluate the mean squared error of each model on the test set
- Identify the best performing penalty type
Tips for choosing the appropriate penalty type:
- Use ’l2’ for general-purpose regularization and to prevent large coefficient values
- Choose ’l1’ for feature selection, as it promotes sparsity in the coefficient values
- Opt for ’elasticnet’ to combine the benefits of both L1 and L2 regularization
- Use None when you want to disable regularization entirely
Considerations when using different penalties:
- L2 penalty tends to shrink all coefficients towards zero, but not exactly to zero
- L1 penalty can set some coefficients exactly to zero, effectively performing feature selection
- Elasticnet combines L1 and L2 penalties, offering a balance between feature selection and coefficient shrinkage
- The strength of the regularization is controlled by the
alpha
parameter, which should be tuned in conjunction with the penalty type