Configure SVC "cache_size" Parameter

The cache_size parameter in scikit-learn’s SVC (Support Vector Classification) class controls the size of the kernel cache, which is used for storing pre-computed kernel matrix values.

Support Vector Machines (SVMs) are powerful supervised learning algorithms used for classification and regression tasks. The SVC class in scikit-learn is an implementation of SVM for classification problems.

The cache_size parameter specifies the size of the kernel cache in megabytes (MB). It determines the amount of memory allocated for caching the kernel matrix during training. A larger cache size can speed up training by reducing the number of kernel matrix computations.

The default value for cache_size is 200 MB.

In practice, the optimal value for cache_size depends on the available memory of the system and the size of the training dataset. Common values range from 200 MB to several gigabytes.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
import time

# Generate synthetic dataset
X, y = make_classification(n_samples=10000, n_features=20, n_informative=10,
                           n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different cache_size values
cache_size_values = [200, 500, 1000, 2000]
accuracies = []
train_times = []

for cache_size in cache_size_values:
    start_time = time.time()
    svc = SVC(kernel='rbf', cache_size=cache_size, random_state=42)
    svc.fit(X_train, y_train)
    end_time = time.time()
    train_time = end_time - start_time

    y_pred = svc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    accuracies.append(accuracy)
    train_times.append(train_time)
    print(f"cache_size={cache_size} MB, Training time: {train_time:.2f}s, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

cache_size=200 MB, Training time: 0.67s, Accuracy: 0.944
cache_size=500 MB, Training time: 0.69s, Accuracy: 0.944
cache_size=1000 MB, Training time: 0.68s, Accuracy: 0.944
cache_size=2000 MB, Training time: 0.66s, Accuracy: 0.944

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and redundant features
Split the data into train and test sets
Train SVC models with different cache_size values
Evaluate the training time and accuracy of each model on the test set

Some tips and heuristics for setting cache_size:

Increase cache_size if you have sufficient memory to speed up training
Larger cache_size values are beneficial for larger datasets
Monitor memory usage to ensure the cache size doesn’t exceed available memory

Issues to consider:

Setting cache_size too high can lead to out-of-memory errors
The optimal cache size depends on the dataset size and available system memory
Increasing cache_size may not always lead to significant improvements in training time

See Also