Configure SVC "gamma" Parameter

The gamma parameter in scikit-learn’s SVC class controls the influence of individual training examples when fitting the decision boundary.

Support Vector Machines (SVMs) are powerful classifiers that find an optimal hyperplane to separate classes in high-dimensional space. The gamma parameter determines the “reach” of each training example.

Smaller gamma values consider more distant examples, resulting in a smoother decision boundary. Larger values consider only close examples, leading to a more complex boundary.

The default value for gamma is 'scale', which sets gamma to 1 / (n_features * X.var()). This scales the parameter based on the input features.

In practice, gamma values between 0.1 and 100 are commonly used, depending on the dataset’s characteristics.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different gamma values
gamma_values = [0.1, 1, 10, 100]
accuracies = []

for g in gamma_values:
    svc = SVC(gamma=g, random_state=42)
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"gamma={g}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

gamma=0.1, Accuracy: 0.910
gamma=1, Accuracy: 0.750
gamma=10, Accuracy: 0.480
gamma=100, Accuracy: 0.480

The key steps in this example are:

Generate a synthetic binary classification dataset
Split the data into train and test sets
Train SVC models with different gamma values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting gamma:

Smaller values consider more training examples, larger values consider fewer
Larger values can lead to overfitting, smaller values can lead to underfitting
Use cross-validation to find the optimal value

Issues to consider:

gamma interacts with other parameters like C (regularization)
The optimal value depends on the dataset size and complexity
Computational cost increases with smaller gamma values

See Also