Max error is a useful metric for evaluating the worst-case error of a regression model. It measures the maximum absolute error between the predicted and true values, providing insight into the largest single error the model makes.
The max_error() function in scikit-learn calculates this metric by finding the maximum absolute difference between the predicted and true values. It takes the true labels and predicted labels as input and returns a single float value. Lower values indicate better model performance, while higher values suggest poor worst-case predictions.
Max error is specifically used for regression problems, where it helps to understand the worst-case scenario for predictions. However, it does not provide an overall sense of model performance and is not applicable for classification problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import max_error
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict on test set
y_pred = model.predict(X_test)
# Calculate max error
max_err = max_error(y_test, y_pred)
print(f"Max Error: {max_err:.2f}")
Running the example gives an output like:
Max Error: 0.29
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression(). - Split the dataset into training and test sets using
train_test_split(). - Train a
LinearRegressionmodel on the training set. - Use the trained model to make predictions on the test set with
predict(). - Calculate the max error using
max_error()by comparing the predicted and true values.
First, we generate a synthetic regression dataset using the make_regression() function from scikit-learn. This function creates a dataset with 1000 samples and one feature, allowing us to simulate a regression problem without using real-world data.
Next, we split the dataset into training and test sets using the train_test_split() function. This step is crucial for evaluating the performance of our model on unseen data. We use 80% of the data for training and reserve 20% for testing.
With our data prepared, we train a linear regression model using the LinearRegression class from scikit-learn. The fit() method is called on the model object, passing in the training features (X_train) and labels (y_train) to learn the underlying patterns in the data.
After training, we use the trained model to make predictions on the test set by calling the predict() method with X_test. This generates predicted values for each sample in the test set.
Finally, we evaluate the maximum error of our model using the max_error() function. This function takes the true labels (y_test) and the predicted labels (y_pred) as input and calculates the maximum absolute error between them. The resulting max error score is printed, giving us a quantitative measure of the worst-case error of our model.