Configure SVR "degree" Parameter

The degree parameter in scikit-learn’s SVR (Support Vector Regression) controls the complexity of the decision boundary in the feature space.

Support Vector Regression is a regression algorithm that tries to find a hyperplane in a high-dimensional space that minimizes the distance between the hyperplane and the training data points.

The degree parameter determines the degree of the polynomial kernel function used to transform the input data into a higher-dimensional space. A higher degree allows the model to learn more complex, non-linear relationships.

The default value for degree is 3.

In practice, values between 2 and 5 are commonly used depending on the complexity of the relationship in the data.

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset with non-linear relationship
X, y = make_regression(n_samples=1000, n_features=1, n_informative=1, noise=10,
                       random_state=42, effective_rank=2)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different degree values
degree_values = [1, 2, 3, 4, 5]
mse_scores = []

for degree in degree_values:
    svr = SVR(kernel='poly', degree=degree, C=1.0, epsilon=0.1)
    svr.fit(X_train, y_train)
    y_pred = svr.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    mse_scores.append(mse)
    print(f"degree={degree}, MSE: {mse:.3f}")

Running the example gives an output like:

degree=1, MSE: 92.456
degree=2, MSE: 92.622
degree=3, MSE: 92.979
degree=4, MSE: 92.206
degree=5, MSE: 92.630

The key steps in this example are:

Generate a synthetic regression dataset with a non-linear relationship
Split the data into train and test sets
Train SVR models with different degree values
Evaluate the mean squared error of each model on the test set

Some tips and heuristics for setting degree:

Start with the default value of 3 and adjust based on model performance
Higher degree values can capture more complex relationships but may overfit
Consider the computational cost of higher degree values

Issues to consider:

The optimal degree depends on the complexity of the relationship in the data
Using a degree that is too low may result in underfitting
Using a degree that is too high may lead to overfitting and increased computation time

See Also