SKLearner Home | About | Contact | Examples

Configure KNeighborsRegressor "algorithm" Parameter

The algorithm parameter in scikit-learn’s KNeighborsRegressor specifies the method used to compute the nearest neighbors.

KNeighborsRegressor is used for regression problems where predictions are based on the k-nearest neighbors of each point. The algorithm parameter determines the approach used to find these neighbors.

The algorithm parameter can take the following values:

The default value for algorithm is auto.

In practice, values ball_tree and kd_tree are commonly used for large datasets, while brute is used for smaller datasets.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different algorithm values
algorithm_values = ['auto', 'ball_tree', 'kd_tree', 'brute']
results = []

for alg in algorithm_values:
    knr = KNeighborsRegressor(algorithm=alg)
    knr.fit(X_train, y_train)
    y_pred = knr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    results.append((alg, mse))
    print(f"algorithm={alg}, MSE: {mse:.3f}")

Running the example gives an output like:

algorithm=auto, MSE: 3728.344
algorithm=ball_tree, MSE: 3728.344
algorithm=kd_tree, MSE: 3728.344
algorithm=brute, MSE: 3728.344

The key steps in this example are:

  1. Generate a synthetic regression dataset with informative features.
  2. Split the data into training and test sets.
  3. Train KNeighborsRegressor models with different algorithm values.
  4. Evaluate the mean squared error of each model on the test set.

Some tips and heuristics for setting the algorithm parameter:

Issues to consider:



See Also