The gamma
parameter in scikit-learn’s SVC
class controls the influence of individual training examples when fitting the decision boundary.
Support Vector Machines (SVMs) are powerful classifiers that find an optimal hyperplane to separate classes in high-dimensional space. The gamma
parameter determines the “reach” of each training example.
Smaller gamma
values consider more distant examples, resulting in a smoother decision boundary. Larger values consider only close examples, leading to a more complex boundary.
The default value for gamma
is 'scale'
, which sets gamma
to 1 / (n_features * X.var())
. This scales the parameter based on the input features.
In practice, gamma
values between 0.1 and 100 are commonly used, depending on the dataset’s characteristics.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
n_redundant=0, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different gamma values
gamma_values = [0.1, 1, 10, 100]
accuracies = []
for g in gamma_values:
svc = SVC(gamma=g, random_state=42)
svc.fit(X_train, y_train)
y_pred = svc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
accuracies.append(accuracy)
print(f"gamma={g}, Accuracy: {accuracy:.3f}")
Running the example gives an output like:
gamma=0.1, Accuracy: 0.910
gamma=1, Accuracy: 0.750
gamma=10, Accuracy: 0.480
gamma=100, Accuracy: 0.480
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train
SVC
models with differentgamma
values - Evaluate the accuracy of each model on the test set
Some tips and heuristics for setting gamma
:
- Smaller values consider more training examples, larger values consider fewer
- Larger values can lead to overfitting, smaller values can lead to underfitting
- Use cross-validation to find the optimal value
Issues to consider:
gamma
interacts with other parameters likeC
(regularization)- The optimal value depends on the dataset size and complexity
- Computational cost increases with smaller
gamma
values