Scikit-Learn root_mean_squared_log_error() Metric

The mean_gamma_deviance() metric measures the mean gamma deviance between predicted and actual values, primarily used for regression tasks involving continuous data. It quantifies how well the model predicts the data distribution.

Lower values indicate better model performance, with 0 being a perfect fit.

The metric is suitable for regression problems but not for classification tasks. Limitations include its inapplicability to classification problems and potential issues with interpreting gamma distribution results.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_gamma_deviance
import numpy as np

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
y = np.abs(y)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Gradient Boosting Regressor
model = GradientBoostingRegressor(random_state=42)
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Calculate mean gamma deviance
gamma_deviance = mean_gamma_deviance(y_test, y_pred)
print(f"Mean Gamma Deviance: {gamma_deviance:.2f}")

Running the example gives an output like:

Mean Gamma Deviance: 0.81

The steps are as follows:

Generate a synthetic regression dataset using make_regression().
Split the dataset into training and test sets using train_test_split().
Train a GradientBoostingRegressor on the training set.
Use the trained model to make predictions on the test set with predict().
Calculate the mean gamma deviance between the predicted and actual values using mean_gamma_deviance().

First, we generate a synthetic regression dataset using the make_regression() function from scikit-learn. This function creates a dataset with 1000 samples and 20 features, adding some noise to simulate real-world data.

Next, we split the dataset into training and test sets using the train_test_split() function. This step is crucial for evaluating the performance of our model on unseen data. We use 80% of the data for training and reserve 20% for testing.

With our data prepared, we train a Gradient Boosting Regressor using the GradientBoostingRegressor class from scikit-learn. The fit() method is called on the model object, passing in the training features (X_train) and labels (y_train) to learn the underlying patterns in the data.

After training, we use the trained model to make predictions on the test set by calling the predict() method with X_test. This generates predicted values for each sample in the test set.

Finally, we evaluate the mean gamma deviance of our model using the mean_gamma_deviance() function. This function takes the true values (y_test) and the predicted values (y_pred) as input and calculates the mean gamma deviance. The resulting score is printed, giving us a quantitative measure of our model’s performance.

This example demonstrates how to use the mean_gamma_deviance() function from scikit-learn to evaluate the performance of a regression model.

See Also