The weights
parameter in scikit-learn’s KNeighborsRegressor
controls how the influence of the nearest neighbors is distributed when making predictions.
KNeighborsRegressor
is a regression algorithm that predicts the value of a target variable based on the average of the target values of the k nearest neighbors in the feature space. The weights
parameter determines how the influence of neighbors is distributed.
The weights
parameter can be set to “uniform”, which assigns equal weight to all neighbors, or “distance”, which assigns weights inversely proportional to the distance from the query point. Custom functions can also be used for domain-specific weight assignments.
The default value for weights
is “uniform”.
In practice, “uniform” and “distance” are commonly used. “Distance” weighting can provide more accurate models as it considers the relevance of closer neighbors more heavily.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic regression dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different weights values
weights_values = ['uniform', 'distance']
errors = []
for weight in weights_values:
knn = KNeighborsRegressor(weights=weight)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
error = mean_squared_error(y_test, y_pred)
errors.append(error)
print(f"weights={weight}, Mean Squared Error: {error:.3f}")
Running the example gives an output like:
weights=uniform, Mean Squared Error: 3728.344
weights=distance, Mean Squared Error: 3699.934
The key steps in this example are:
- Generate a synthetic regression dataset with relevant noise features.
- Split the data into train and test sets.
- Train
KNeighborsRegressor
models with differentweights
values. - Evaluate the mean squared error of each model on the test set.
Some tips and heuristics for setting weights
:
- Use “uniform” for simpler, faster models where all neighbors are equally relevant.
- Use “distance” for more accurate models where closer neighbors are more relevant.
- Custom functions can be used for domain-specific weight assignments.
Issues to consider:
- The optimal
weights
setting depends on the nature of the data and the problem. - “Distance” weighting can be computationally more expensive.
- Custom weight functions require careful design and validation.