Configure SVC "decision_function_shape" Parameter

The decision_function_shape parameter in scikit-learn’s SVC class determines the shape of the decision function used for multi-class classification.

SVC (Support Vector Classification) is a powerful algorithm for classification tasks. It finds the optimal hyperplane that maximally separates the classes in the feature space.

The decision_function_shape parameter controls whether the binary SVC problem is extended to a multi-class case using a one-vs-one or one-vs-rest scheme. It takes the values ‘ovo’ for one-vs-one and ‘ovr’ for one-vs-rest.

The default value is ‘ovr’, which trains n_classes binary SVCs, each distinguishing one class from the rest. ‘ovo’ trains n_classes * (n_classes - 1) / 2 binary SVCs for each pair of classes, which can be more computationally expensive.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Generate synthetic multi-class dataset
X, y = make_classification(n_samples=1000, n_classes=4, n_features=10,
                           n_informative=8, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train with different decision_function_shape values
shapes = ['ovo', 'ovr']
accuracies = []

for shape in shapes:
    svc = SVC(decision_function_shape=shape, random_state=42)
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(accuracy)
    print(f"decision_function_shape='{shape}', Accuracy: {accuracy:.3f}")

Running the example gives an output like:

decision_function_shape='ovo', Accuracy: 0.835
decision_function_shape='ovr', Accuracy: 0.835

The key steps in this example are:

Generate a synthetic multi-class classification dataset
Split the data into train and test sets
Train SVC models with different decision_function_shape values
Evaluate the accuracy of each model on the test set

Some tips and heuristics for setting decision_function_shape:

Use the default ‘ovr’ unless you have a specific reason to use ‘ovo’
‘ovo’ can be beneficial when you have a large number of classes
‘ovo’ is more computationally expensive than ‘ovr’

Issues to consider:

The choice of ‘ovo’ vs ‘ovr’ can affect the model’s performance and interpretability
‘ovo’ can lead to ambiguous regions in the decision space where multiple classes overlap
The optimal choice may depend on the specific characteristics of your dataset and problem

See Also