Configure KNeighborsClassifier "metric_params" Parameter

The metric_params parameter in scikit-learn’s KNeighborsClassifier allows passing arguments to a custom distance metric function, enabling the use of domain knowledge or non-standard distance metrics.

KNeighborsClassifier uses distance metrics to determine the similarity between data points. While scikit-learn provides several built-in metrics like Euclidean and Manhattan distance, sometimes a custom metric can improve performance by incorporating problem-specific information.

By default, metric_params is set to None. When using a custom distance function, metric_params can be a dictionary of arguments passed to the function.

This example demonstrates creating a custom weighted Manhattan distance metric and using it with KNeighborsClassifier via the metric_params parameter. The performance of the custom metric is compared to standard Euclidean and Manhattan distances.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset with features of different scales
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
                           n_clusters_per_class=1, weights=[0.8, 0.2], flip_y=0.01, random_state=42)
X[:, 0] *= 100  # Increase scale of first feature

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define custom weighted Manhattan distance metric
def weighted_manhattan(x, y, w):
    return sum(w * abs(a - b) for a, b, w in zip(x, y, w))

# Train with different distance metrics
metrics = ['euclidean', 'manhattan', weighted_manhattan]
metric_params = [None, None, {'w': [1, 1, 0.01, 0.01]}]
accuracies = []

for metric, params in zip(metrics, metric_params):
    knn = KNeighborsClassifier(n_neighbors=5, metric=metric, metric_params=params)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"Metric: {metric.__name__ if callable(metric) else metric}, Accuracy: {accuracy:.3f}")

The output will look something like:

Metric: euclidean, Accuracy: 0.905
Metric: manhattan, Accuracy: 0.915
Metric: weighted_manhattan, Accuracy: 0.905

The key steps in this example are:

Generate a synthetic binary classification dataset with features of different scales
Split the data into train and test sets
Define a custom weighted Manhattan distance function that takes a w argument for feature weights
Train KNeighborsClassifier models with Euclidean, Manhattan, and custom weighted Manhattan metrics
For the custom metric, pass feature weights via the metric_params parameter
Evaluate the accuracy of each model on the test set

Some tips and heuristics for using metric_params:

Use metric_params to pass arguments that encode domain knowledge about feature importance
When using non-weighted distance metrics, scale features to similar ranges to avoid some features dominating the distance calculation
Compare the performance of custom metrics to standard ones to assess their benefit

Issues to consider:

Ensure feature weights passed via metric_params are positive
Custom distance metrics implemented in Python may be slower than optimized built-in metrics in scikit-learn
metric_params is not supported for all metric values; check the documentation for compatibility

See Also