SKLearner Home | About | Contact | Examples

Configure SGDClassifier "n_jobs" Parameter

The n_jobs parameter in scikit-learn’s SGDClassifier controls the number of CPU cores used for parallelization during training.

Stochastic Gradient Descent (SGD) is an optimization algorithm used for training various linear models. SGDClassifier implements a plain stochastic gradient descent learning routine that supports different loss functions and penalties for classification.

The n_jobs parameter determines how many CPU cores are used to parallelize the computation of gradient updates. A value of -1 uses all available cores, while a positive integer specifies the exact number of cores to use.

The default value for n_jobs is None, which means no parallelization is used. Common values include -1 (use all cores) or positive integers like 2, 4, or 8, depending on the available hardware.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, n_classes=3, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different n_jobs values
n_jobs_values = [None, 1, 2, 4, -1]
results = []

for n_jobs in n_jobs_values:
    start_time = time.time()
    sgd = SGDClassifier(n_jobs=n_jobs, random_state=42)
    sgd.fit(X_train, y_train)
    train_time = time.time() - start_time

    y_pred = sgd.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    results.append((n_jobs, train_time, accuracy))
    print(f"n_jobs={n_jobs}, Training Time: {train_time:.3f}s, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

n_jobs=None, Training Time: 0.149s, Accuracy: 0.677
n_jobs=1, Training Time: 0.151s, Accuracy: 0.677
n_jobs=2, Training Time: 0.137s, Accuracy: 0.677
n_jobs=4, Training Time: 0.092s, Accuracy: 0.677
n_jobs=-1, Training Time: 0.084s, Accuracy: 0.677

The key steps in this example are:

  1. Generate a synthetic multi-class classification dataset
  2. Split the data into train and test sets
  3. Train SGDClassifier models with different n_jobs values
  4. Measure training time and accuracy for each model
  5. Compare performance across different n_jobs configurations

Some tips and heuristics for setting n_jobs:

Issues to consider:



See Also