SKLearner Home | About | Contact | Examples

Scikit-Learn KNeighborsRegressor Model

K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm often used for regression tasks. It predicts the value of a new data point based on the average of its k-nearest neighbors in the training set.

The key hyperparameters of KNeighborsRegressor include n_neighbors (number of neighbors to use), weights (weight function used in prediction), and algorithm (algorithm used to compute the nearest neighbors).

This algorithm is appropriate for regression problems, where the goal is to predict a continuous value.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# generate regression dataset
X, y = make_regression(n_samples=100, n_features=5, noise=0.1, random_state=1)

# split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

# create model
model = KNeighborsRegressor(n_neighbors=3)

# fit model
model.fit(X_train, y_train)

# evaluate model
yhat = model.predict(X_test)
mse = mean_squared_error(y_test, yhat)
print('Mean Squared Error: %.3f' % mse)

# make a prediction
row = [[0.5, -0.2, 0.3, 0.1, 0.2]]
yhat = model.predict(row)
print('Predicted: %.3f' % yhat[0])

Running the example gives an output like:

Mean Squared Error: 1468.023
Predicted: 13.825

The steps are as follows:

  1. A synthetic regression dataset is generated using the make_regression() function. This creates a dataset with a specified number of samples (n_samples), features (n_features), and a fixed random seed (random_state) for reproducibility. The dataset is split into training and test sets using train_test_split().

  2. A KNeighborsRegressor model is instantiated with n_neighbors set to 3. The model is fit on the training data using the fit() method.

  3. The model’s performance is evaluated by predicting on the test set and calculating the mean squared error (MSE) between the predictions (yhat) and the actual values (y_test).

  4. A single prediction is made by passing a new data sample to the predict() method.

This example demonstrates how to set up and use a KNeighborsRegressor model for regression tasks. The simplicity of KNN makes it a good choice for quick, straightforward regression modeling.



See Also