SKLearner Home | About | Contact | Examples

Configure KNeighborsClassifier "n_neighbors" Parameter

The n_neighbors parameter in scikit-learn’s KNeighborsClassifier controls the number of nearest neighbors used to make predictions.

K-Nearest Neighbors (KNN) is a non-parametric algorithm that classifies new data points based on their similarity to the training examples. It predicts the class of a new point by finding the k closest training examples and taking a majority vote.

The n_neighbors parameter sets the value of k, i.e., how many neighbors are considered when making predictions. It has a significant impact on the model’s behavior and performance.

The default value for n_neighbors is 5.

In practice, values between 3 and 15 are commonly used, depending on the characteristics of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
                           n_informative=2, n_clusters_per_class=1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_neighbors values
n_neighbors_values = [1, 3, 5, 10, 20]
accuracies = []

for k in n_neighbors_values:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"n_neighbors={k}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

n_neighbors=1, Accuracy: 0.775
n_neighbors=3, Accuracy: 0.925
n_neighbors=5, Accuracy: 0.875
n_neighbors=10, Accuracy: 0.850
n_neighbors=20, Accuracy: 0.875

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with two informative features
  2. Split the data into train and test sets
  3. Train KNeighborsClassifier models with different n_neighbors values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting n_neighbors:

Issues to consider:



See Also