SKLearner Home | About | Contact | Examples

Scikit-Learn d2_tweedie_score() Metric

d2_tweedie_score() is a metric for evaluating the performance of regression models, specifically those dealing with positive data that may include a probability mass at zero. This metric calculates the explained variance of a model, providing insight into how well the model captures the variance in the observed data.

The d2_tweedie_score() function compares the predicted values from a model to the mean of the observed data, resulting in a score between 0 and 1. A score closer to 1 indicates that the model explains most of the variance, while a score closer to 0 suggests poor performance. This metric is particularly useful for regression problems involving Poisson, gamma, or compound Poisson-gamma distributions. However, it is not suitable for non-Tweedie distributions or non-positive data.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import PoissonRegressor
from sklearn.metrics import d2_tweedie_score

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Ensure positive target values (required for Tweedie distribution)
y = y - y.min() + 1

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Poisson Regressor
reg = PoissonRegressor(alpha=0.1)
reg.fit(X_train, y_train)

# Predict on test set
y_pred = reg.predict(X_test)

# Calculate d2_tweedie_score
d2_score = d2_tweedie_score(y_test, y_pred, power=1)
print(f"D² Tweedie Score: {d2_score:.2f}")

Running the example gives an output like:

D² Tweedie Score: 0.92
  1. Generate a synthetic regression dataset using make_regression() and adjust the target values to be positive, ensuring compatibility with the Tweedie distribution.

  2. Split the dataset into training and testing sets using train_test_split(), reserving 20% of the data for testing.

  3. Train a PoissonRegressor on the training set, specifying an alpha value of 0.1.

  4. Use the trained regressor to predict target values on the test set.

  5. Evaluate the model’s performance using d2_tweedie_score() with power=1, which corresponds to the Poisson distribution.

This example demonstrates how to use the d2_tweedie_score() function in scikit-learn to evaluate the performance of a regression model, providing a measure of the explained variance for data that follows a Tweedie distribution.



See Also