SKLearner Home | About | Contact | Examples

Configure KNeighborsClassifier "leaf_size" Parameter

Tuning leaf_size in scikit-learn’s KNeighborsClassifier

The leaf_size parameter in scikit-learn’s KNeighborsClassifier controls the leaf size of the ball tree or KD tree used for efficient neighbor search.

K-Nearest Neighbors (KNN) is a non-parametric algorithm that classifies new data points based on the majority class among the K nearest training examples. The leaf_size parameter determines the maximum number of data points in each leaf node of the tree structure used to speed up neighbor search.

Smaller values of leaf_size lead to deeper trees with more leaves, allowing for finer-grained searches but with higher memory usage. Larger values create shallower trees with fewer leaves, reducing memory consumption but potentially impacting search speed and model performance.

The default value for leaf_size is 30. In practice, values between 10 and 100 are commonly used depending on the size and dimensionality of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different leaf_size values
leaf_size_values = [10, 30, 50, 100]
accuracies = []

for ls in leaf_size_values:
    knn = KNeighborsClassifier(n_neighbors=5, leaf_size=ls)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"leaf_size={ls}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

leaf_size=10, Accuracy: 0.905
leaf_size=30, Accuracy: 0.905
leaf_size=50, Accuracy: 0.905
leaf_size=100, Accuracy: 0.905

The key steps in this example are:

  1. Generate a synthetic binary classification dataset with informative and redundant features
  2. Split the data into train and test sets
  3. Train KNeighborsClassifier models with different leaf_size values
  4. Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting leaf_size:

Issues to consider:



See Also