Configure KNeighborsClassifier "weights" Parameter

The weights parameter in scikit-learn’s KNeighborsClassifier determines how the contribution of each neighbor is weighted when making predictions.

K-Nearest Neighbors (KNN) is a non-parametric algorithm that classifies new data points based on their proximity to points in the training set. The weights parameter controls how much influence each of the k neighbors has on the prediction.

By default, weights is set to 'uniform', which gives equal weight to all neighbors regardless of their distance. Setting weights to 'distance' assigns weights proportional to the inverse of the distance, giving more influence to closer neighbors.

Custom weighting functions can also be provided to fine-tune the behavior based on domain knowledge or specific requirements of the problem.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=5,
                           n_informative=3, n_redundant=1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a custom weighting function
def custom_weights(distances):
    return 1 / (1 + distances**2)

# Train with different weights values
weights_values = ['uniform', 'distance', custom_weights]
accuracies = []

for w in weights_values:
    knn = KNeighborsClassifier(n_neighbors=5, weights=w)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"weights={w}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

weights=uniform, Accuracy: 0.855
weights=distance, Accuracy: 0.870
weights=<function custom_weights at 0x108d74c20>, Accuracy: 0.855

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Define a custom weighting function
Train KNeighborsClassifier models with different weights values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting weights:

Use 'uniform' for equal weighting regardless of distance
Use 'distance' to give more influence to closer neighbors
Create custom weighting functions for finer control over neighbor influence
Experiment with different values and functions to find the best fit for your data

Issues to consider:

The choice of weights can impact computational efficiency
Distance-based weights can be sensitive to noisy or irrelevant features
The optimal weights setting depends on the characteristics of the data and problem

See Also