The positive
parameter in scikit-learn’s Lasso
class constrains the coefficients to be non-negative when set to True
. This can be useful when you have prior knowledge that the coefficients should be positive.
Lasso (Least Absolute Shrinkage and Selection Operator) is a linear regression technique that performs both feature selection and regularization. It adds a penalty term to the loss function, shrinking some coefficients and setting others to zero.
By default, positive
is set to False
, allowing coefficients to be either positive or negative. When set to True
, it forces all coefficients to be non-negative.
In practice, positive=True
is used when there is a strong prior belief that the features should have a non-negative impact on the target variable.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import r2_score
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=5, n_informative=3, random_state=42)
# Make some coefficients negative
coef = [2, -1, 0, -0.5, 0]
y = np.dot(X, coef) + 0.1 * np.random.normal(size=X.shape[0])
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit Lasso with default (positive=False)
lasso = Lasso(alpha=0.1, random_state=42)
lasso.fit(X_train, y_train)
print(f"Default Lasso Coefficients: {lasso.coef_}")
print(f"Default Lasso R^2: {r2_score(y_test, lasso.predict(X_test)):.3f}")
# Fit Lasso with positive=True
lasso_pos = Lasso(alpha=0.1, positive=True, random_state=42)
lasso_pos.fit(X_train, y_train)
print(f"Positive Lasso Coefficients: {lasso_pos.coef_}")
print(f"Positive Lasso R^2: {r2_score(y_test, lasso_pos.predict(X_test)):.3f}")
Running the example gives an output like:
Default Lasso Coefficients: [ 1.90402002 -0.90527933 -0. -0.44395165 0. ]
Default Lasso R^2: 0.995
Positive Lasso Coefficients: [2.10419903 0. 0. 0. 0. ]
Positive Lasso R^2: 0.864
The key steps in this example are:
- Generate a synthetic regression dataset with some positive and negative coefficients
- Split the data into train and test sets
- Fit a default
Lasso
model (allowing negative coefficients) - Fit a
Lasso
model withpositive=True
(constraining coefficients to be non-negative) - Compare the coefficients and R^2 scores of the two models
Some tips and heuristics for using positive
:
- Set
positive=True
when you have strong domain knowledge that the features should have non-negative effects - Using
positive=True
can improve interpretability by eliminating counterintuitive negative coefficients
Issues to consider:
- Constraining coefficients to be non-negative may lead to reduced performance if the true coefficients are negative
- Only use
positive=True
when you have a strong prior belief about the sign of the coefficients