The probability
parameter in scikit-learn’s SVC
class determines whether the model should enable probability estimates. When set to True
, the classifier will fit an additional model to estimate class probabilities.
Support Vector Machines (SVMs) like SVC
do not directly provide probability estimates. Instead, when probability
is True
, scikit-learn’s SVC
uses Platt scaling to calibrate the decision function scores into probabilities.
The default value for probability
is False
to save computational cost. When probability estimates are needed, it is common to set probability=True
.
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Generate synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, n_features=10,
n_informative=5, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train SVC with probability=False
svc_no_prob = SVC(probability=False)
svc_no_prob.fit(X_train, y_train)
y_pred_no_prob = svc_no_prob.predict(X_test)
accuracy_no_prob = accuracy_score(y_test, y_pred_no_prob)
print(f"SVC with probability=False, Accuracy: {accuracy_no_prob:.3f}")
# Train SVC with probability=True
svc_prob = SVC(probability=True)
svc_prob.fit(X_train, y_train)
y_pred_prob = svc_prob.predict(X_test)
accuracy_prob = accuracy_score(y_test, y_pred_prob)
print(f"SVC with probability=True, Accuracy: {accuracy_prob:.3f}")
# Get probability estimates
probabilities = svc_prob.predict_proba(X_test)
print("First 5 probability estimates:")
print(probabilities[:5])
# Attempting to get probabilities from svc_no_prob will raise an AttributeError
try:
svc_no_prob.predict_proba(X_test)
except AttributeError as e:
print(f"Error when trying to get probabilities from svc_no_prob: {e}")
The output will look like:
SVC with probability=False, Accuracy: 0.920
SVC with probability=True, Accuracy: 0.920
First 5 probability estimates:
[[0.97766859 0.02233141]
[0.10281995 0.89718005]
[0.93834819 0.06165181]
[0.96767281 0.03232719]
[0.94002999 0.05997001]]
Error when trying to get probabilities from svc_no_prob: This 'SVC' has no attribute 'predict_proba'
The key steps in this example are:
- Generate a synthetic binary classification dataset
- Split the data into train and test sets
- Train two
SVC
models, one withprobability=False
and one withprobability=True
- Use
predict_proba()
to get probability estimates from the model withprobability=True
- Show that
predict_proba()
is not available for the model withprobability=False
Some tips and heuristics for setting probability
:
- Set
probability=True
when you need probability estimates for your application - Using
probability=True
increases the computational cost and training time of the model - The probabilities obtained may need to be calibrated for some applications
Issues to consider:
- Outputting probabilities requires more memory to store the additional model parameters
- The underlying SVC model does not directly estimate probabilities, so the calibrated estimates may be less reliable than those from inherently probabilistic models