Max error is a useful metric for evaluating the worst-case error of a regression model. It measures the maximum absolute error between the predicted and true values, providing insight into the largest single error the model makes.
The max_error()
function in scikit-learn calculates this metric by finding the maximum absolute difference between the predicted and true values. It takes the true labels and predicted labels as input and returns a single float value. Lower values indicate better model performance, while higher values suggest poor worst-case predictions.
Max error is specifically used for regression problems, where it helps to understand the worst-case scenario for predictions. However, it does not provide an overall sense of model performance and is not applicable for classification problems.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import max_error
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=1, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Predict on test set
y_pred = model.predict(X_test)
# Calculate max error
max_err = max_error(y_test, y_pred)
print(f"Max Error: {max_err:.2f}")
Running the example gives an output like:
Max Error: 0.29
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression()
. - Split the dataset into training and test sets using
train_test_split()
. - Train a
LinearRegression
model on the training set. - Use the trained model to make predictions on the test set with
predict()
. - Calculate the max error using
max_error()
by comparing the predicted and true values.
First, we generate a synthetic regression dataset using the make_regression()
function from scikit-learn. This function creates a dataset with 1000 samples and one feature, allowing us to simulate a regression problem without using real-world data.
Next, we split the dataset into training and test sets using the train_test_split()
function. This step is crucial for evaluating the performance of our model on unseen data. We use 80% of the data for training and reserve 20% for testing.
With our data prepared, we train a linear regression model using the LinearRegression
class from scikit-learn. The fit()
method is called on the model object, passing in the training features (X_train
) and labels (y_train
) to learn the underlying patterns in the data.
After training, we use the trained model to make predictions on the test set by calling the predict()
method with X_test
. This generates predicted values for each sample in the test set.
Finally, we evaluate the maximum error of our model using the max_error()
function. This function takes the true labels (y_test
) and the predicted labels (y_pred
) as input and calculates the maximum absolute error between them. The resulting max error score is printed, giving us a quantitative measure of the worst-case error of our model.