Configure SVR "gamma" Parameter

The gamma parameter in scikit-learn’s SVR (Support Vector Regression) controls the influence of individual training examples on the model’s predictions.

SVR is a regression algorithm that tries to find a hyperplane in a high-dimensional space that fits the training data with a certain margin of tolerance. The gamma parameter determines the reach of a single training example, affecting the model’s ability to capture complex patterns.

A smaller gamma leads to a smoother, less complex model, while a larger gamma allows the model to capture more intricate patterns but risks overfitting.

The default value for gamma is 'scale', which sets gamma to 1 / (n_features * X.var()). In practice, gamma values between 0.1 and 100 are commonly used, depending on the scale and distribution of the data.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate synthetic dataset with non-linear patterns
X, y = make_regression(n_samples=1000, n_features=10, noise=10, random_state=42)
y = np.exp((y - y.min()) / (y.max() - y.min()))

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different gamma values
gamma_values = [0.1, 1, 10, 100]
mse_scores = []

for gamma in gamma_values:
    svr = SVR(gamma=gamma)
    svr.fit(X_train, y_train)
    y_pred = svr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"gamma={gamma}, MSE: {mse:.3f}")

Running the example gives an output like:

gamma=0.1, MSE: 0.006
gamma=1, MSE: 0.060
gamma=10, MSE: 0.066
gamma=100, MSE: 0.066

The key steps in this example are:

Generate a synthetic regression dataset with non-linear patterns
Split the data into train and test sets
Train SVR models with different gamma values
Evaluate the models using mean squared error (MSE)

Some tips and heuristics for setting gamma:

Smaller gamma leads to a smoother, less complex model
Larger gamma allows the model to capture more intricate patterns
Cross-validation can help find the optimal gamma value

Issues to consider:

Setting gamma too small may result in underfitting
Setting gamma too large may cause overfitting
The optimal gamma depends on the scale and distribution of the data

See Also