SKLearner Home | About | Contact | Examples

Configure KNeighborsClassifier "metric" Parameter

The metric parameter in scikit-learn’s KNeighborsClassifier determines the distance metric used for finding the nearest neighbors.

K-Nearest Neighbors (KNN) is a simple and effective algorithm for classification tasks. It works by finding the K closest training examples to a new data point and assigning the majority class among those neighbors.

The metric parameter specifies how the distance between two data points is calculated. This choice can significantly impact the performance of the KNN model.

The default value for metric is ‘minkowski’ with p=2, which is equivalent to the standard Euclidean distance.

Other commonly used metrics include ‘manhattan’ (L1 distance) and ‘cosine’ similarity.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=5,
                           n_informative=3, n_redundant=1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different metric values
metric_values = ['euclidean', 'manhattan', 'minkowski']
accuracies = []

for metric in metric_values:
    knn = KNeighborsClassifier(n_neighbors=5, metric=metric)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"metric={metric}, Accuracy: {accuracy:.3f}")

The output will look something like:

metric=euclidean, Accuracy: 0.855
metric=manhattan, Accuracy: 0.870
metric=minkowski, Accuracy: 0.855

The key steps in this example are:

  1. Generate a synthetic multiclass classification dataset
  2. Split the data into train and test sets
  3. Train KNeighborsClassifier models with different metric values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting metric:

Issues to consider:



See Also