The GammaRegressor is a generalized linear model in scikit-learn used for modeling continuous, positive target values that follow a Gamma distribution. This is useful for targets that are non-negative and right-skewed, such as insurance claims or rainfall amounts.
The key hyperparameters of GammaRegressor
are alpha
(the shape of the Gamma distribution) and beta
(the inverse scale). The model is appropriate for regression problems with a Gamma-distributed target variable.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import GammaRegressor
from sklearn.metrics import mean_absolute_error
import numpy as np
# generate regression dataset with gamma-distributed target
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=1)
y = np.exp((y + abs(y.min())) / 200)
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = GammaRegressor()
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mae = mean_absolute_error(y_test, yhat)
print('MAE: %.3f' % mae)
# make a prediction
row = [[0.59332206]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
MAE: 0.415
Predicted: 2.921
The steps are as follows:
A synthetic regression dataset is generated using
make_regression()
. The target variable is then transformed to follow a Gamma distribution using the exponential function.The dataset is split into training and test sets using
train_test_split()
.A
GammaRegressor
model is instantiated with default hyperparameters and fit on the training data usingfit()
.The model’s performance is evaluated on the test set using the mean absolute error metric.
A single prediction is made by passing a new data sample to
predict()
.
This example demonstrates how to use the GammaRegressor
model for datasets with a Gamma-distributed target variable. The model is straightforward to set up and use, making it a valuable tool for these types of regression problems.