The positive
parameter in scikit-learn’s Ridge
class controls whether the coefficients are constrained to be non-negative.
Ridge Regression is a linear regression technique that adds L2 regularization to ordinary least squares. This regularization helps to prevent overfitting and can handle multicollinearity among the features.
By default, positive
is set to False
, which allows the coefficients to be either positive or negative. Setting positive=True
constrains all the coefficients to be non-negative.
Constraining coefficients to be non-negative can be useful when you have prior knowledge that the relationships between the features and the target variable are positive. It can also aid in the interpretability of the model.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score
# Generate synthetic dataset
X, y = make_regression(n_samples=100, n_features=5, n_informative=3, random_state=42, noise=0.5, bias=1.5)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit Ridge with default 'positive=False'
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
y_pred = ridge.predict(X_test)
print(f"Ridge with positive=False:")
print(f"Coefficients: {ridge.coef_}")
print(f"R-squared: {r2_score(y_test, y_pred):.3f}")
# Fit Ridge with 'positive=True'
ridge_pos = Ridge(alpha=1.0, positive=True)
ridge_pos.fit(X_train, y_train)
y_pred_pos = ridge_pos.predict(X_test)
print(f"\nRidge with positive=True:")
print(f"Coefficients: {ridge_pos.coef_}")
print(f"R-squared: {r2_score(y_test, y_pred_pos):.3f}")
Running the example gives an output like:
Ridge with positive=False:
Coefficients: [56.21929632 35.20087519 -0.07415078 63.36560971 -0.10233578]
R-squared: 1.000
Ridge with positive=True:
Coefficients: [56.22030096 35.19292776 0. 63.37819969 0. ]
R-squared: 1.000
The key steps in this example are:
- Generate a synthetic regression dataset with positive coefficients
- Split the data into train and test sets
- Fit a
Ridge
model with the defaultpositive=False
- Fit a
Ridge
model withpositive=True
- Compare the coefficients and R-squared scores of the two models
Some tips and heuristics for using positive
:
- Set
positive=True
when you expect or want to enforce positive relationships based on domain knowledge - Using
positive=True
can aid in the interpretability of the model by aligning with prior expectations - Constraining coefficients to be non-negative may reduce performance if some true relationships are negative
Issues to consider:
- Constraining coefficients can increase bias if the assumptions about the relationships are incorrect
- Setting
positive=True
has no impact if all the coefficients are already positive withpositive=False
- The
positive
parameter is not applicable for classification problems, only regression