d2_tweedie_score()
is a metric for evaluating the performance of regression models, specifically those dealing with positive data that may include a probability mass at zero. This metric calculates the explained variance of a model, providing insight into how well the model captures the variance in the observed data.
The d2_tweedie_score()
function compares the predicted values from a model to the mean of the observed data, resulting in a score between 0 and 1. A score closer to 1 indicates that the model explains most of the variance, while a score closer to 0 suggests poor performance. This metric is particularly useful for regression problems involving Poisson, gamma, or compound Poisson-gamma distributions. However, it is not suitable for non-Tweedie distributions or non-positive data.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import d2_tweedie_score
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Ensure positive target values (required for Tweedie distribution)
y = y - y.min() + 1
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a Poisson Regressor
reg = PoissonRegressor(alpha=0.1)
reg.fit(X_train, y_train)
# Predict on test set
y_pred = reg.predict(X_test)
# Calculate d2_tweedie_score
d2_score = d2_tweedie_score(y_test, y_pred, power=1)
print(f"D² Tweedie Score: {d2_score:.2f}")
Running the example gives an output like:
D² Tweedie Score: 0.92
Generate a synthetic regression dataset using
make_regression()
and adjust the target values to be positive, ensuring compatibility with the Tweedie distribution.Split the dataset into training and testing sets using
train_test_split()
, reserving 20% of the data for testing.Train a
PoissonRegressor
on the training set, specifying an alpha value of 0.1.Use the trained regressor to predict target values on the test set.
Evaluate the model’s performance using
d2_tweedie_score()
withpower=1
, which corresponds to the Poisson distribution.
This example demonstrates how to use the d2_tweedie_score()
function in scikit-learn to evaluate the performance of a regression model, providing a measure of the explained variance for data that follows a Tweedie distribution.