The weights
parameter in scikit-learn’s KNeighborsClassifier
determines how the contribution of each neighbor is weighted when making predictions.
K-Nearest Neighbors (KNN) is a non-parametric algorithm that classifies new data points based on their proximity to points in the training set. The weights
parameter controls how much influence each of the k
neighbors has on the prediction.
By default, weights
is set to 'uniform'
, which gives equal weight to all neighbors regardless of their distance. Setting weights
to 'distance'
assigns weights proportional to the inverse of the distance, giving more influence to closer neighbors.
Custom weighting functions can also be provided to fine-tune the behavior based on domain knowledge or specific requirements of the problem.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=3, n_features=5,
n_informative=3, n_redundant=1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Define a custom weighting function
def custom_weights(distances):
return 1 / (1 + distances**2)
# Train with different weights values
weights_values = ['uniform', 'distance', custom_weights]
accuracies = []
for w in weights_values:
knn = KNeighborsClassifier(n_neighbors=5, weights=w)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"weights={w}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
weights=uniform, Accuracy: 0.855
weights=distance, Accuracy: 0.870
weights=<function custom_weights at 0x108d74c20>, Accuracy: 0.855
The key steps in this example are:
- Generate a synthetic multi-class classification dataset
- Split the data into train and test sets
- Define a custom weighting function
- Train
KNeighborsClassifier
models with differentweights
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting weights
:
- Use
'uniform'
for equal weighting regardless of distance - Use
'distance'
to give more influence to closer neighbors - Create custom weighting functions for finer control over neighbor influence
- Experiment with different values and functions to find the best fit for your data
Issues to consider:
- The choice of
weights
can impact computational efficiency - Distance-based weights can be sensitive to noisy or irrelevant features
- The optimal
weights
setting depends on the characteristics of the data and problem