Scikit-Learn max_error() Metric

Max error is a useful metric for evaluating the worst-case error of a regression model. It measures the maximum absolute error between the predicted and true values, providing insight into the largest single error the model makes.

The max_error() function in scikit-learn calculates this metric by finding the maximum absolute difference between the predicted and true values. It takes the true labels and predicted labels as input and returns a single float value. Lower values indicate better model performance, while higher values suggest poor worst-case predictions.

Max error is specifically used for regression problems, where it helps to understand the worst-case scenario for predictions. However, it does not provide an overall sense of model performance and is not applicable for classification problems.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import max_error

# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Calculate max error
max_err = max_error(y_test, y_pred)
print(f"Max Error: {max_err:.2f}")

Running the example gives an output like:

Max Error: 0.29

The steps are as follows:

Generate a synthetic regression dataset using make_regression().
Split the dataset into training and test sets using train_test_split().
Train a LinearRegression model on the training set.
Use the trained model to make predictions on the test set with predict().
Calculate the max error using max_error() by comparing the predicted and true values.

First, we generate a synthetic regression dataset using the make_regression() function from scikit-learn. This function creates a dataset with 1000 samples and one feature, allowing us to simulate a regression problem without using real-world data.

Next, we split the dataset into training and test sets using the train_test_split() function. This step is crucial for evaluating the performance of our model on unseen data. We use 80% of the data for training and reserve 20% for testing.

With our data prepared, we train a linear regression model using the LinearRegression class from scikit-learn. The fit() method is called on the model object, passing in the training features (X_train) and labels (y_train) to learn the underlying patterns in the data.

After training, we use the trained model to make predictions on the test set by calling the predict() method with X_test. This generates predicted values for each sample in the test set.

Finally, we evaluate the maximum error of our model using the max_error() function. This function takes the true labels (y_test) and the predicted labels (y_pred) as input and calculates the maximum absolute error between them. The resulting max error score is printed, giving us a quantitative measure of the worst-case error of our model.

See Also