Configure SVC "degree" Parameter

The degree parameter in scikit-learn’s SVC class controls the complexity of the decision boundary when using a polynomial kernel.

Support Vector Machines (SVMs) are powerful algorithms for classification and regression tasks. The SVC class in scikit-learn implements Support Vector Classification, which can handle non-linearly separable data by using kernel functions to transform the input space.

The degree parameter is specific to the polynomial kernel, which allows for learning non-linear decision boundaries. It determines the degree of the polynomial used to transform the input features.

The default value for degree is 3.

In practice, values between 2 and 5 are commonly used depending on the complexity of the dataset.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=0, n_clusters_per_class=1, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different degree values
degree_values = [2, 3, 4, 5]
accuracies = []

for d in degree_values:
    svc = SVC(kernel='poly', degree=d, random_state=42)
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"degree={d}, Accuracy: {accuracy:.3f}")

Running the example gives an output like:

degree=2, Accuracy: 0.960
degree=3, Accuracy: 0.970
degree=4, Accuracy: 0.935
degree=5, Accuracy: 0.965

The key steps in this example are:

Generate a synthetic binary classification dataset with informative and redundant features
Split the data into train and test sets
Train SVC models with different degree values using a polynomial kernel
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting degree:

Start with the default value of 3 and try values from 2 to 5
Higher degree leads to more complex decision boundaries, which can capture intricate patterns
Use cross-validation to select the optimal degree value for your dataset

Issues to consider:

Setting the degree too high can lead to overfitting, especially on small datasets
The computational cost increases with higher degree values
The polynomial kernel may not be suitable for very high-dimensional data

See Also