SKLearner Home | About | Contact | Examples

Configure KNeighborsRegressor "n_jobs" Parameter

The n_jobs parameter in KNeighborsRegressor specifies the number of parallel jobs to run for neighbors search.

KNeighborsRegressor is a non-parametric method used for regression that predicts the target based on the k-nearest neighbors in the feature space.

The n_jobs parameter allows leveraging multiple CPU cores to speed up the computation, especially useful for large datasets.

The default value for n_jobs is None, which means 1 job. Setting n_jobs=-1 uses all available CPU cores.

import time
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_jobs values
n_jobs_values = [1, 2, 4, -1]
errors = []
times = []

for n in n_jobs_values:
    knr = KNeighborsRegressor(n_jobs=n)
    start_time = time.time()
    knr.fit(X_train, y_train)
    y_pred = knr.predict(X_test)
    end_time = time.time()
    error = mean_squared_error(y_test, y_pred)
    elapsed_time = end_time - start_time
    errors.append(error)
    times.append(elapsed_time)
    print(f"n_jobs={n}, Mean Squared Error: {error:.3f}, Time: {elapsed_time:.3f} seconds")

Running the example gives an output like:

n_jobs=1, Mean Squared Error: 3728.344, Time: 0.003 seconds
n_jobs=2, Mean Squared Error: 3728.344, Time: 0.026 seconds
n_jobs=4, Mean Squared Error: 3728.344, Time: 0.014 seconds
n_jobs=-1, Mean Squared Error: 3728.344, Time: 0.016 seconds

The key steps in this example are:

  1. Generate a synthetic regression dataset.
  2. Split the data into train and test sets.
  3. Train KNeighborsRegressor models with different n_jobs values.
  4. Measure and evaluate both the mean squared error and the execution time of each model on the test set.

Some tips and heuristics for setting n_jobs:

Issues to consider:



See Also