The positive
parameter in scikit-learn’s LinearRegression
constrains the model coefficients to be non-negative.
This is useful when domain knowledge suggests that the features should have a positive relationship with the target variable, and negative coefficients would not be meaningful or interpretable.
By default, positive
is set to False
, allowing coefficients to be either positive or negative.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=5, n_informative=3, noise=10, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different 'positive' values
lr_unconstrained = LinearRegression(positive=False)
lr_unconstrained.fit(X_train, y_train)
lr_constrained = LinearRegression(positive=True)
lr_constrained.fit(X_train, y_train)
# Evaluate models
y_pred_unconstrained = lr_unconstrained.predict(X_test)
y_pred_constrained = lr_constrained.predict(X_test)
mse_unconstrained = mean_squared_error(y_test, y_pred_unconstrained)
mse_constrained = mean_squared_error(y_test, y_pred_constrained)
print(f"Unconstrained MSE: {mse_unconstrained:.2f}")
print(f"Constrained MSE: {mse_constrained:.2f}")
print("\nUnconstrained Coefficients:")
print(lr_unconstrained.coef_)
print("\nConstrained Coefficients:")
print(lr_constrained.coef_)
Running the example gives an output like:
Unconstrained MSE: 127.72
Constrained MSE: 119.50
Unconstrained Coefficients:
[57.20237595 35.281705 -0.73589378 63.00489133 -1.19808211]
Constrained Coefficients:
[57.21270229 35.18974713 0. 63.15067422 0. ]
The key steps in this example are:
- Generate a synthetic regression dataset with some positively correlated features
- Split the data into train and test sets
- Train
LinearRegression
models withpositive=False
andpositive=True
- Evaluate the mean squared error (MSE) of each model on the test set
- Print the model coefficients to show the effect of the
positive
parameter
Some tips and heuristics for using positive
:
- Use
positive
when domain knowledge suggests coefficients should be non-negative positive
can improve interpretability but may reduce model flexibility and performance- If using
positive
, scale features to a similar range to avoid one feature dominating
Issues to consider:
- Constraining coefficients may lead to poorer fit if the true relationship has negative coefficients