Scikit-Learn "KNeighborsClassifier" versus "KNeighborsRegressor"

Comparing the KNeighborsClassifier and KNeighborsRegressor demonstrates their use in classification and regression tasks, respectively, using scikit-learn.

In scikit-learn, the KNeighborsClassifier class is used for classification tasks. Its key hyperparameters include n_neighbors (number of neighbors), weights (weight function), and algorithm (algorithm to compute the nearest neighbors).

On the other hand, the KNeighborsRegressor class is used for regression tasks. It shares similar key hyperparameters with KNeighborsClassifier: n_neighbors, weights, and algorithm.

The main difference between these two algorithms is their purpose: KNeighborsClassifier predicts class labels for each sample, while KNeighborsRegressor predicts continuous values. Consequently, the performance and evaluation metrics differ, with accuracy used for classification and mean squared error for regression.

KNeighborsClassifier is ideal when the target variable is categorical, whereas KNeighborsRegressor is suited for continuous target variables. Although they use similar methodologies, their applications and evaluations are distinct.

from sklearn.datasets import make_classification, make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import accuracy_score, mean_squared_error

# Generate synthetic classification dataset
X_class, y_class = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Generate synthetic regression dataset
X_reg, y_reg = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split datasets into train and test sets
X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(X_class, y_class, test_size=0.2, random_state=42)
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)

# Fit and evaluate KNeighborsClassifier
knn_classifier = KNeighborsClassifier(n_neighbors=5)
knn_classifier.fit(X_class_train, y_class_train)
y_pred_class = knn_classifier.predict(X_class_test)
print(f"KNeighborsClassifier accuracy: {accuracy_score(y_class_test, y_pred_class):.3f}")

# Fit and evaluate KNeighborsRegressor
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(X_reg_train, y_reg_train)
y_pred_reg = knn_regressor.predict(X_reg_test)
print(f"KNeighborsRegressor mean squared error: {mean_squared_error(y_reg_test, y_pred_reg):.3f}")

Running the example gives an output like:

KNeighborsClassifier accuracy: 0.810
KNeighborsRegressor mean squared error: 14831.303

Generate a synthetic binary classification dataset using make_classification.
Generate a synthetic regression dataset using make_regression.
Split both datasets into training and test sets using train_test_split.
Instantiate KNeighborsClassifier with default hyperparameters, fit it on the training data, and evaluate its performance on the test set.
Instantiate KNeighborsRegressor with default hyperparameters, fit it on the training data, and evaluate its performance on the test set.
Compare the performance metrics (accuracy for KNeighborsClassifier and mean squared error for KNeighborsRegressor).

See Also