The metric_params
parameter in scikit-learn’s KNeighborsClassifier
allows passing arguments to a custom distance metric function, enabling the use of domain knowledge or non-standard distance metrics.
KNeighborsClassifier
uses distance metrics to determine the similarity between data points. While scikit-learn provides several built-in metrics like Euclidean and Manhattan distance, sometimes a custom metric can improve performance by incorporating problem-specific information.
By default, metric_params
is set to None
. When using a custom distance function, metric_params
can be a dictionary of arguments passed to the function.
This example demonstrates creating a custom weighted Manhattan distance metric and using it with KNeighborsClassifier
via the metric_params
parameter. The performance of the custom metric is compared to standard Euclidean and Manhattan distances.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset with features of different scales
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
n_clusters_per_class=1, weights=[0.8, 0.2], flip_y=0.01, random_state=42)
X[:, 0] *= 100 # Increase scale of first feature
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define custom weighted Manhattan distance metric
def weighted_manhattan(x, y, w):
return sum(w * abs(a - b) for a, b, w in zip(x, y, w))
# Train with different distance metrics
metrics = ['euclidean', 'manhattan', weighted_manhattan]
metric_params = [None, None, {'w': [1, 1, 0.01, 0.01]}]
accuracies = []
for metric, params in zip(metrics, metric_params):
knn = KNeighborsClassifier(n_neighbors=5, metric=metric, metric_params=params)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"Metric: {metric.__name__ if callable(metric) else metric}, Accuracy: {accuracy:.3f}")
The output will look something like:
Metric: euclidean, Accuracy: 0.905
Metric: manhattan, Accuracy: 0.915
Metric: weighted_manhattan, Accuracy: 0.905
The key steps in this example are:
- Generate a synthetic binary classification dataset with features of different scales
- Split the data into train and test sets
- Define a custom weighted Manhattan distance function that takes a
w
argument for feature weights - Train
KNeighborsClassifier
models with Euclidean, Manhattan, and custom weighted Manhattan metrics - For the custom metric, pass feature weights via the
metric_params
parameter - Evaluate the accuracy of each model on the test set
Some tips and heuristics for using metric_params
:
- Use
metric_params
to pass arguments that encode domain knowledge about feature importance - When using non-weighted distance metrics, scale features to similar ranges to avoid some features dominating the distance calculation
- Compare the performance of custom metrics to standard ones to assess their benefit
Issues to consider:
- Ensure feature weights passed via
metric_params
are positive - Custom distance metrics implemented in Python may be slower than optimized built-in metrics in scikit-learn
metric_params
is not supported for allmetric
values; check the documentation for compatibility