Scikit-Learn median_absolute_error() Metric

median_absolute_error is a metric for evaluating the performance of regression models. It represents the median of all absolute errors between predicted and true values. This metric provides a robust measure of error that is less sensitive to outliers compared to the mean absolute error.

The median_absolute_error() function in scikit-learn calculates this metric by finding the median of the absolute differences between the predicted and actual values. It takes the true labels and predicted labels as input and returns a float value, with lower values indicating better model performance.

This metric is used for regression problems and is particularly useful when the dataset contains outliers, as it mitigates their influence on the error measurement. However, it is less commonly used in practice compared to mean absolute error and may not provide useful insights for datasets with highly skewed error distributions.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import median_absolute_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Calculate median absolute error
medae = median_absolute_error(y_test, y_pred)
print(f"Median Absolute Error: {medae:.2f}")

Running the example gives an output like:

Median Absolute Error: 0.07

The steps are as follows:

Generate a synthetic regression dataset using make_regression().
Split the dataset into training and test sets using train_test_split().
Train a LinearRegression model on the training set.
Use the trained model to make predictions on the test set with predict().
Calculate the median absolute error of the predictions using median_absolute_error() by comparing the predicted labels to the true labels.

First, we generate a synthetic regression dataset using the make_regression() function from scikit-learn. This function creates a dataset with 1000 samples and 1 feature, simulating a regression problem with a slight noise.

Next, we split the dataset into training and test sets using the train_test_split() function. This step is crucial for evaluating the performance of our model on unseen data. We use 80% of the data for training and reserve 20% for testing.

With our data prepared, we train a linear regression model using the LinearRegression class from scikit-learn. The fit() method is called on the model object, passing in the training features (X_train) and labels (y_train) to learn the underlying patterns in the data.

After training, we use the trained model to make predictions on the test set by calling the predict() method with X_test. This generates predicted labels for each sample in the test set.

Finally, we evaluate the median absolute error of our model using the median_absolute_error() function. This function takes the true labels (y_test) and the predicted labels (y_pred) as input and calculates the median of the absolute differences between them. The resulting median absolute error is printed, giving us a robust measure of our model’s performance.

This example demonstrates how to use the median_absolute_error() function from scikit-learn to evaluate the performance of a regression model.

See Also