Open In App

How to Choose the Best Kernel Function for SVMs

Support Vector Machines (SVMs) are powerful supervised learning models that can be used for classification, regression, and outlier detection tasks.

What are kernels in SVM?

Kernels in Support Vector Machines (SVMs) are functions that calculate the similarity of pairs of data points in a high-dimensional space. They allow SVMs to discover complex, non-linear patterns in data by implicitly relating the input data to a higher-dimensional feature space where the data can be linearly extracted..



Different Types of Kernel Function in SVMs

Given an arbitrary dataset, you typically don’t know which kernel may work best. It is recommended to start with the simplest hypothesis space first, given that you don’t know much about your data — and work your way up towards the more complex hypothesis spaces.

Different SVM algorithms use differing kinds of kernel functions. These functions are of different kinds—for instance,



1. Linear Kernel

The linear kernel is the simplest and most straightforward kernel function.

It is defined as , K(x, y) = x * y , where x and y are the input vectors.

2. Polynomial kernel

Compared to the linear kernel, the polynomial kernel is a more versatile and broad kernel function.

It is defined as

K(x, y) = (x * y + c)^d , where x and y are the input vectors, c is a constant term, and d is the degree of the polynomial.

3. Radial Basis Function kernel (RBF)

The radial basis function (RBF) kernel is one of the most popular and widely used kernel functions for SVMs.

It is defined as

, where x and y are the input vectors, gamma is a positive parameter, and ||x – y|| is the Euclidean distance between x and y.

4. Sigmoid kernel

Another popular and adaptable kernel function for SVMs is the sigmoid kernel.

It is defined as

, where x and y are the input vectors, alpha and beta are parameters, and tanh is the hyperbolic tangent function.

How to Select the best kernel?

It’s crucial to remember that every kernel performs differently based on the particular dataset. Using cross-validation or other evaluation techniques to compare various kernels and choose the one that performs the best is generally a good idea.

The best hyperparameters for any kernel function may be found via grid search, random search, and Bayesian optimization. It’s crucial to take into account each kernel function’s memory and computational needs, particularly for big datasets with plenty of features.

Stepwise Implementation of how to Choose the Best Kernel Function for SVMs

In the code below,

Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

                    

Load the Iris dataset and split it into training and testing sets.

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

                    

Standardize the features

# Choose the first two features for visualization (2D)
X = X[:, :2]
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)
 
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

                    

Define a list of different kernel functions.

# Define different kernel functions
kernel_functions = ['linear', 'poly', 'rbf', 'sigmoid']

                    

Displaying the data with kernal functions

for i, kernel in enumerate(kernel_functions, 1):
    # Create SVM classifier with the specified kernel
    svm_classifier = SVC(kernel=kernel)
 
    # Train the classifier
    svm_classifier.fit(X_train, y_train)
 
    # Evaluate accuracy
    y_pred = svm_classifier.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
 
    # Plot decision boundary and data points
    plt.subplot(2, 2, i)
    h = .02  # Step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = svm_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
 
    # Put the result into a color plot
    Z = Z.reshape(xx.shape)
    display = DecisionBoundaryDisplay.from_estimator(svm_classifier, X=X_train, contour=True, cmap=plt.cm.Paired, alpha=0.8, ax=plt.gca())
    display.plot(X=X_train, y=y_train, ax=plt.gca())
    plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.Paired, edgecolors='k')
 
    plt.title(f"SVM with {kernel} kernel\nAccuracy: {accuracy:.2f}")
 
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.tight_layout()
plt.show()

                    

Output:

The output of the provided code consists of a 2×2 grid of subplots, each representing the decision boundaries created by a Support Vector Machine (SVM) classifier with a different kernel function on the Iris dataset. Here’s a short explanation of each subplot:

Linear Kernel:

Polynomial Kernel:

RBF (Radial Basis Function) Kernel:

Sigmoid Kernel:

Additional Points

Overall, in order to pick a kernel in SVM, we must first understand the nature of the issue, which might be linear or nonlinear classification, anomaly detection, or regression.


Article Tags :