Scikit-Learn TweedieRegressor Model

TweedieRegressor is a versatile regression algorithm used for fitting generalized linear models, suitable for various distribution types including normal, Poisson, and gamma.

It is controlled by the power parameter, making it adaptable to different types of regression problems.

Key hyperparameters of TweedieRegressor include:

power: Determines the distribution family.
alpha: Regularization strength.
max_iter: Maximum number of iterations.

The algorithm is appropriate for regression problems involving different distributions, such as insurance claims and energy consumption.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import TweedieRegressor
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)
y = abs(y.round().astype(int))  # ensure target is count data

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = TweedieRegressor(power=1.5, alpha=0.5, max_iter=1000)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[-0.79415228, -0.18416172, 0.45477481, 0.57280825, -0.14451754]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 2816.652
Predicted: 57.215

The steps are as follows:

A synthetic regression dataset is generated using the make_regression() function with specified noise and a random seed for reproducibility. The dataset is split into training and test sets using train_test_split().
A TweedieRegressor model is instantiated with specific power, alpha, and max_iter parameters. The model is fit on the training data using the fit() method.
The model’s performance is evaluated using the mean squared error (MSE) metric by comparing the predictions (yhat) to the actual values (y_test).
A single prediction is made by passing a new data sample to the predict() method.

This example illustrates how to set up and use a TweedieRegressor for regression tasks with different distributions, demonstrating the flexibility and capability of this algorithm in scikit-learn.

See Also