TweedieRegressor is a versatile regression algorithm used for fitting generalized linear models, suitable for various distribution types including normal, Poisson, and gamma.
It is controlled by the power
parameter, making it adaptable to different types of regression problems.
Key hyperparameters of TweedieRegressor
include:
power
: Determines the distribution family.alpha
: Regularization strength.max_iter
: Maximum number of iterations.
The algorithm is appropriate for regression problems involving different distributions, such as insurance claims and energy consumption.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import TweedieRegressor
from sklearn.metrics import mean_squared_error
# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)
y = abs(y.round().astype(int)) # ensure target is count data
# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# create model
model = TweedieRegressor(power=1.5, alpha=0.5, max_iter=1000)
# fit model
model.fit(X_train, y_train)
# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)
# make a prediction
row = [[-0.79415228, -0.18416172, 0.45477481, 0.57280825, -0.14451754]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])
Running the example gives an output like:
Mean Squared Error: 2816.652
Predicted: 57.215
The steps are as follows:
- A synthetic regression dataset is generated using the
make_regression()
function with specified noise and a random seed for reproducibility. The dataset is split into training and test sets usingtrain_test_split()
. - A
TweedieRegressor
model is instantiated with specificpower
,alpha
, andmax_iter
parameters. The model is fit on the training data using thefit()
method. - The model’s performance is evaluated using the mean squared error (MSE) metric by comparing the predictions (
yhat
) to the actual values (y_test
). - A single prediction is made by passing a new data sample to the
predict()
method.
This example illustrates how to set up and use a TweedieRegressor
for regression tasks with different distributions, demonstrating the flexibility and capability of this algorithm in scikit-learn.