Root Mean Squared Logarithmic Error (RMSLE) is a metric used to evaluate the performance of regression models. It measures the logarithmic difference between predicted and actual values, penalizing underestimates more than overestimates.
RMSLE is calculated by taking the square root of the average of the squared logarithmic differences between the predicted and actual values. Good RMSLE values are close to 0, indicating predictions are close to actual values. Poor values are higher, indicating larger errors.
RMSLE is commonly used in regression problems where the target variable can have a wide range of values and reducing the impact of large differences is important. However, RMSLE should not be used for data with negative values or when differences in large values should be penalized equally as small differences.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_log_error
import numpy as np
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
y = np.abs(y) # Ensure target values are positive
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train a Random Forest Regressor
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Predict on test set
y_pred = model.predict(X_test)
# Calculate RMSLE
rmsle = np.sqrt(mean_squared_log_error(y_test, y_pred))
print(f"Root Mean Squared Logarithmic Error: {rmsle:.2f}")
Running the example gives an output like:
Root Mean Squared Logarithmic Error: 0.82
The steps are as follows:
- Generate a synthetic regression dataset using
make_regression()
and ensure target values are positive. - Split the dataset into training and test sets using
train_test_split()
. - Train a
RandomForestRegressor
on the training set. - Use the trained model to make predictions on the test set with
predict()
. - Calculate the RMSLE of the predictions using
mean_squared_log_error()
and take the square root of the result.
This example demonstrates how to use the root_mean_squared_log_error()
function from scikit-learn to evaluate the performance of a regression model, highlighting the importance of handling logarithmic differences in prediction errors.