The shrinking
parameter in scikit-learn’s Support Vector Regression (SVR) controls whether the shrinking heuristic is used during the optimization process.
SVR is a regression algorithm that tries to find a hyperplane that fits the data with a certain margin of tolerance. The shrinking
parameter determines if the shrinking heuristic is used to speed up the optimization process.
When shrinking
is set to True
(default), the algorithm tries to identify and shrink the active set of support vectors during the optimization process. This can lead to faster training times, especially on large datasets.
However, in some cases, setting shrinking
to False
may improve the model’s performance at the cost of longer training times.
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import time
# Generate synthetic dataset
X, y = make_regression(n_samples=1000, n_features=10, noise=0.1, random_state=42)
# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train with different shrinking values
for shrinking in [True, False]:
start_time = time.time()
svr = SVR(shrinking=shrinking)
svr.fit(X_train, y_train)
y_pred = svr.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
end_time = time.time()
print(f"shrinking={shrinking}, MSE: {mse:.3f}, Training time: {end_time - start_time:.3f}s")
Running the example gives an output like:
shrinking=True, MSE: 12758.146, Training time: 0.037s
shrinking=False, MSE: 12758.146, Training time: 0.037s
The key steps in this example are:
- Generate a synthetic regression dataset with noise
- Split the data into train and test sets
- Train SVR models with
shrinking
set toTrue
andFalse
- Evaluate the mean squared error and training time for each setting
Some tips and heuristics for setting shrinking
:
- Enable
shrinking
(default) for large datasets to speed up training - Disable
shrinking
if the model’s performance is more important than training time
Issues to consider:
- The impact of
shrinking
may vary depending on the dataset characteristics - Disabling
shrinking
may require more computational resources and longer training times