SKLearner Home | About | Contact | Examples

Configure KNeighborsClassifier "metric_params" Parameter

The metric_params parameter in scikit-learn’s KNeighborsClassifier allows passing arguments to a custom distance metric function, enabling the use of domain knowledge or non-standard distance metrics.

KNeighborsClassifier uses distance metrics to determine the similarity between data points. While scikit-learn provides several built-in metrics like Euclidean and Manhattan distance, sometimes a custom metric can improve performance by incorporating problem-specific information.

By default, metric_params is set to None. When using a custom distance function, metric_params can be a dictionary of arguments passed to the function.

This example demonstrates creating a custom weighted Manhattan distance metric and using it with KNeighborsClassifier via the metric_params parameter. The performance of the custom metric is compared to standard Euclidean and Manhattan distances.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset with features of different scales
X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0,
                           n_clusters_per_class=1, weights=[0.8, 0.2], flip_y=0.01, random_state=42)
X[:, 0] *= 100  # Increase scale of first feature

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define custom weighted Manhattan distance metric
def weighted_manhattan(x, y, w):
    return sum(w * abs(a - b) for a, b, w in zip(x, y, w))

# Train with different distance metrics
metrics = ['euclidean', 'manhattan', weighted_manhattan]
metric_params = [None, None, {'w': [1, 1, 0.01, 0.01]}]
accuracies = []

for metric, params in zip(metrics, metric_params):
    knn = KNeighborsClassifier(n_neighbors=5, metric=metric, metric_params=params)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"Metric: {metric.__name__ if callable(metric) else metric}, Accuracy: {accuracy:.3f}")

The output will look something like:

Metric: euclidean, Accuracy: 0.905
Metric: manhattan, Accuracy: 0.915
Metric: weighted_manhattan, Accuracy: 0.905

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with features of different scales
  2. Split the data into train and test sets
  3. Define a custom weighted Manhattan distance function that takes a w argument for feature weights
  4. Train KNeighborsClassifier models with Euclidean, Manhattan, and custom weighted Manhattan metrics
  5. For the custom metric, pass feature weights via the metric_params parameter
  6. Evaluate the accuracy of each model on the test set

Some tips and heuristics for using metric_params:

Issues to consider:



See Also