SKLearner Home | About | Contact | Examples

Scikit-Learn mean_absolute_percentage_error() Metric

Mean Absolute Percentage Error (MAPE) measures the average absolute percentage difference between actual and predicted values. It tells us how far our predictions are, on average, from the actual values in percentage terms.

The mean_absolute_percentage_error() function in scikit-learn calculates MAPE by taking the absolute difference between actual and predicted values, dividing by the actual values, and averaging these percentage errors. It returns a float value, where lower values indicate better performance.

MAPE is mainly used for regression problems where understanding the percentage error is crucial. However, it is not suitable for cases where actual values are zero or near zero, as it can lead to division by zero or extremely large percentage errors. It can also be biased towards predictions that underestimate actual values.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_percentage_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Calculate MAPE
mape = mean_absolute_percentage_error(y_test, y_pred)
print(f"Mean Absolute Percentage Error: {mape:.2%}")

Running the example gives an output like:

Mean Absolute Percentage Error: 1.33%

The steps are as follows:

  1. Generate a synthetic regression dataset using make_regression().
  2. Split the dataset into training and test sets using train_test_split().
  3. Train a LinearRegression model on the training set.
  4. Use the trained model to make predictions on the test set with predict().
  5. Calculate the Mean Absolute Percentage Error (MAPE) using mean_absolute_percentage_error() by comparing the predicted values to the true values.

First, a synthetic regression dataset is generated using make_regression(), which simulates a regression problem with 1000 samples and 1 feature.

The dataset is then split into training and test sets using train_test_split(), with 80% of the data used for training and 20% reserved for testing.

A LinearRegression model is trained on the training data using the fit() method, which learns the relationship between the features and the target variable.

Predictions are made on the test set using the trained model’s predict() method, generating predicted values for the test samples.

Finally, the Mean Absolute Percentage Error (MAPE) is calculated using the mean_absolute_percentage_error() function by comparing the true values (y_test) with the predicted values (y_pred). The result is printed as a percentage, providing a measure of the model’s performance in terms of percentage error.



See Also