IsotonicRegression is a non-parametric regression algorithm that fits a non-decreasing function to data. It is particularly useful for calibrating the probability estimates of a classifier.
The key hyperparameter of IsotonicRegression
is the increasing
constraint, which can be set to either ‘increasing’ (default) or ‘decreasing’.
The algorithm is appropriate for regression problems where a monotonic relationship between the input and output variables is assumed.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.isotonic import IsotonicRegression
from sklearn.metrics import mean_squared_error
# generate regression dataset with positive trend
X, y = make_regression(n_samples=100, n_features=1, noise=10.0, random_state=1)
y = y**2
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = IsotonicRegression()
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('MSE: %.3f' % mse)
# make a prediction
row = [[0.3]]
yhat = model.predict(row)
print('Prediction: %.3f' % yhat[0])
Running the example gives an output like:
MSE: 41873643.975
Prediction: 3035.757
The steps are as follows:
A synthetic regression dataset with a monotonically increasing relationship between the input and output variables is generated using
make_regression()
. The output variable is then transformed by squaring it to create a more pronounced monotonic trend. The dataset is split into training and test sets.An
IsotonicRegression
model is instantiated with default hyperparameters and fit on the training data using thefit()
method.The performance of the model is evaluated by making predictions on the test set (
yhat
) and comparing them to the actual values (y_test
) using the mean squared error metric.A single prediction is made by passing a new data sample to the
predict()
method.
This example demonstrates how to use IsotonicRegression
for modeling data with a monotonic relationship. The model learns the non-decreasing function that best fits the training data and can then be used to make predictions on new data points.
The model is particularly useful when calibrating the probability estimates of a classifier, as it can help to ensure that the probabilities are monotonically related to the true likelihood of an event occurring.