SKLearner Home | About | Contact | Examples

Scikit-Learn GridSearchCV LinearSVC

Hyperparameter tuning is essential for optimizing machine learning models for best performance. In this example, we’ll demonstrate how to use scikit-learn’s GridSearchCV to perform hyperparameter tuning for Linear Support Vector Classification (LinearSVC), a popular algorithm for both binary and multi-class classification tasks.

Grid search is a method for evaluating different combinations of model hyperparameters to find the best performing configuration. It exhaustively searches through a specified parameter grid, trains and evaluates the model for each combination using cross-validation, and selects the hyperparameters that yield the best performance metric.

Linear Support Vector Classification (LinearSVC) is a linear model that attempts to find the hyperplane that best separates the classes. It is particularly useful for large-scale classification tasks due to its efficiency.

The key hyperparameters for LinearSVC include the regularization parameter (C), which controls the trade-off between achieving a low error on the training data and minimizing the norm of the weights; the loss function (loss), which can be hinge or squared_hinge; and the penalty type (penalty), which determines the norm used in the penalization (l1 or l2).

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import LinearSVC

# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5, random_state=42)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid
param_grid = {
    'C': [0.01, 0.1, 1, 10],
    'loss': ['hinge', 'squared_hinge'],
    'penalty': ['l2']
}

# Perform grid search
grid_search = GridSearchCV(estimator=LinearSVC(random_state=42, max_iter=10000),
                           param_grid=param_grid,
                           cv=5,
                           scoring='accuracy')
grid_search.fit(X_train, y_train)

# Report best score and parameters
print(f"Best score: {grid_search.best_score_:.3f}")
print(f"Best parameters: {grid_search.best_params_}")

# Evaluate on test set
best_model = grid_search.best_estimator_
accuracy = best_model.score(X_test, y_test)
print(f"Test set accuracy: {accuracy:.3f}")

Running the example gives an output like:

Best score: 0.805
Best parameters: {'C': 0.1, 'loss': 'hinge', 'penalty': 'l2'}
Test set accuracy: 0.805

The steps are as follows:

  1. Generate a synthetic binary classification dataset using make_classification with 20 features.
  2. Split the dataset into train and test sets using train_test_split.
  3. Define the parameter grid with different values for C, loss, and penalty hyperparameters.
  4. Perform grid search using GridSearchCV, specifying the LinearSVC model, parameter grid, 5-fold cross-validation, and accuracy scoring metric.
  5. Report the best cross-validation score and best set of hyperparameters found by grid search.
  6. Evaluate the best model on the hold-out test set and report the accuracy.

By using GridSearchCV, we can efficiently explore different hyperparameter settings for LinearSVC and find the configuration that maximizes the model’s performance, ensuring we achieve the best possible outcome for our classification task.



See Also