Comparing the KNeighborsClassifier
and KNeighborsRegressor
demonstrates their use in classification and regression tasks, respectively, using scikit-learn.
In scikit-learn, the KNeighborsClassifier
class is used for classification tasks. Its key hyperparameters include n_neighbors
(number of neighbors), weights
(weight function), and algorithm
(algorithm to compute the nearest neighbors).
On the other hand, the KNeighborsRegressor
class is used for regression tasks. It shares similar key hyperparameters with KNeighborsClassifier
: n_neighbors
, weights
, and algorithm
.
The main difference between these two algorithms is their purpose: KNeighborsClassifier
predicts class labels for each sample, while KNeighborsRegressor
predicts continuous values. Consequently, the performance and evaluation metrics differ, with accuracy used for classification and mean squared error for regression.
KNeighborsClassifier
is ideal when the target variable is categorical, whereas KNeighborsRegressor
is suited for continuous target variables. Although they use similar methodologies, their applications and evaluations are distinct.
from sklearn.datasets import make_classification, make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor
from sklearn.metrics import accuracy_score, mean_squared_error
# Generate synthetic classification dataset
X_class, y_class = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
# Generate synthetic regression dataset
X_reg, y_reg = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)
# Split datasets into train and test sets
X_class_train, X_class_test, y_class_train, y_class_test = train_test_split(X_class, y_class, test_size=0.2, random_state=42)
X_reg_train, X_reg_test, y_reg_train, y_reg_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
# Fit and evaluate KNeighborsClassifier
knn_classifier = KNeighborsClassifier(n_neighbors=5)
knn_classifier.fit(X_class_train, y_class_train)
y_pred_class = knn_classifier.predict(X_class_test)
print(f"KNeighborsClassifier accuracy: {accuracy_score(y_class_test, y_pred_class):.3f}")
# Fit and evaluate KNeighborsRegressor
knn_regressor = KNeighborsRegressor(n_neighbors=5)
knn_regressor.fit(X_reg_train, y_reg_train)
y_pred_reg = knn_regressor.predict(X_reg_test)
print(f"KNeighborsRegressor mean squared error: {mean_squared_error(y_reg_test, y_pred_reg):.3f}")
Running the example gives an output like:
KNeighborsClassifier accuracy: 0.810
KNeighborsRegressor mean squared error: 14831.303
- Generate a synthetic binary classification dataset using
make_classification
. - Generate a synthetic regression dataset using
make_regression
. - Split both datasets into training and test sets using
train_test_split
. - Instantiate
KNeighborsClassifier
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Instantiate
KNeighborsRegressor
with default hyperparameters, fit it on the training data, and evaluate its performance on the test set. - Compare the performance metrics (accuracy for
KNeighborsClassifier
and mean squared error forKNeighborsRegressor
).