How to Choose the Best Kernel Function for SVMs

Support Vector Machines (SVMs) are powerful supervised learning models that can be used for classification, regression, and outlier detection tasks.

What are kernels in SVM?

Kernels in Support Vector Machines (SVMs) are functions that calculate the similarity of pairs of data points in a high-dimensional space. They allow SVMs to discover complex, non-linear patterns in data by implicitly relating the input data to a higher-dimensional feature space where the data can be linearly extracted..

Different Types of Kernel Function in SVMs

Given an arbitrary dataset, you typically don’t know which kernel may work best. It is recommended to start with the simplest hypothesis space first, given that you don’t know much about your data — and work your way up towards the more complex hypothesis spaces.

Different SVM algorithms use differing kinds of kernel functions. These functions are of different kinds—for instance,

1. Linear Kernel

The linear kernel is the simplest and most straightforward kernel function.

It is defined as , K(x, y) = x * y , where x and y are the input vectors.

The linear kernel assumes that the data already has a linear connection or is linearly separable and does not alter the data in any way.
Although the linear kernel is quick and simple to use, the complexity and non-linearity of the data may be lost on it.
When dealing with high-dimensional and sparse data, the linear kernel works well because introducing more features might lead to overfitting or other computational issues.
The set of linear kernels for high-dimensional data problems, such as text and document categorization.

2. Polynomial kernel

Compared to the linear kernel, the polynomial kernel is a more versatile and broad kernel function.

It is defined as

K(x, y) = (x * y + c)^d , where x and y are the input vectors, c is a constant term, and d is the degree of the polynomial.

Modifying the parameters c and d allows the polynomial kernel to change in complexity and shape, and it can represent both polynomial and non-linear relationships between the data points.
The polynomial kernel works well with low-dimensional, dense data, where the SVM algorithm’s accuracy and performance may be enhanced by include additional features.

3. Radial Basis Function kernel (RBF)

The radial basis function (RBF) kernel is one of the most popular and widely used kernel functions for SVMs.

It is defined as

, where x and y are the input vectors, gamma is a positive parameter, and ||x – y|| is the Euclidean distance between x and y.

Modifying the parameters c and d allows the polynomial kernel to change in complexity and shape, and it can represent both polynomial and non-linear relationships between the data points.
The polynomial kernel works well with low-dimensional, dense data, where the SVM algorithm’s accuracy and performance may be enhanced by include additional features.

4. Sigmoid kernel

Another popular and adaptable kernel function for SVMs is the sigmoid kernel.

It is defined as

, where x and y are the input vectors, alpha and beta are parameters, and tanh is the hyperbolic tangent function.

The neural network activation function may be approximated by the sigmoid kernel, which can also represent non-linear and sigmoidal interactions between the data points.
When dealing with binary and categorical data, where the data points may have discrete and logical values, the sigmoid kernel is appropriate.
The hyperbolic tangent kernel and the multilayer perceptron kernel are other names for the sigmoid kernel.
The sigmoid kernel can model non-linear and sigmoidal relationships between the data points, and it can approximate

How to Select the best kernel?

It’s crucial to remember that every kernel performs differently based on the particular dataset. Using cross-validation or other evaluation techniques to compare various kernels and choose the one that performs the best is generally a good idea.

Linear kernels:
- Suitable for high-dimensional data or linearly separable data.
- Computes the dot product of input vectors, efficient for large feature sets.
- Simple and often used as a baseline for comparison.
RBF kernel:
- Default choice for non-linear problems in SVMs.
- Captures complex relationships without prior knowledge of data.
- Sensitive to hyperparameter tuning, especially gamma.
Polynomial kernels:
- Effective for problems with polynomial patterns.
- Commonly used in computer vision and image recognition.
- Degree parameter controls the complexity of the polynomial.
Sigmoid kernel:
- Useful for neural network applications.
- Appropriate when data distribution resembles a sigmoid.
- Requires careful tuning of parameters for best performance.

The best hyperparameters for any kernel function may be found via grid search, random search, and Bayesian optimization. It’s crucial to take into account each kernel function’s memory and computational needs, particularly for big datasets with plenty of features.

Stepwise Implementation of how to Choose the Best Kernel Function for SVMs

In the code below,

Import necessary libraries

Python3

import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.svm import SVC

from sklearn.metrics import accuracy_score

Load the Iris dataset and split it into training and testing sets.

Python3

# Load Iris dataset

iris = datasets.load_iris()

X = iris.data

y = iris.target

Standardize the features

Python3

# Choose the first two features for visualization (2D)

X = X[:, :2]
# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(

    X, y, test_size=0.3, random_state=42)
 
# Standardize features

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

Define a list of different kernel functions.

Python3

# Define different kernel functions

kernel_functions = ['linear', 'poly', 'rbf', 'sigmoid']

Displaying the data with kernal functions

Iterate over each kernel function, create an SVM classifier with the specified kernel, train the classifier, and make predictions on the test set.
Evaluate the accuracy of each classifier.
Plot the decision boundaries for each kernel function along with the training data points.
Add labels to the subplots for clarity.
Display the plots.

Python3

for i, kernel in enumerate(kernel_functions, 1):

    # Create SVM classifier with the specified kernel

    svm_classifier = SVC(kernel=kernel)
 
    # Train the classifier

    svm_classifier.fit(X_train, y_train)
 
    # Evaluate accuracy

    y_pred = svm_classifier.predict(X_test)

    accuracy = accuracy_score(y_test, y_pred)
 
    # Plot decision boundary and data points

    plt.subplot(2, 2, i)

    h = .02  # Step size in the mesh

    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),

                         np.arange(y_min, y_max, h))

    Z = svm_classifier.predict(np.c_[xx.ravel(), yy.ravel()])
 
    # Put the result into a color plot

    Z = Z.reshape(xx.shape)

    display = DecisionBoundaryDisplay.from_estimator(svm_classifier, X=X_train, contour=True, cmap=plt.cm.Paired, alpha=0.8, ax=plt.gca())

    display.plot(X=X_train, y=y_train, ax=plt.gca())

    plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.Paired, edgecolors='k')
 
    plt.title(f"SVM with {kernel} kernel\nAccuracy: {accuracy:.2f}")
 
plt.xlabel("Feature 1")

plt.ylabel("Feature 2")
plt.tight_layout()
plt.show()

Output:

The output of the provided code consists of a 2×2 grid of subplots, each representing the decision boundaries created by a Support Vector Machine (SVM) classifier with a different kernel function on the Iris dataset. Here’s a short explanation of each subplot:

Linear Kernel:

The decision boundary is a straight line.
The linear kernel is suitable for datasets that are linearly separable.

Polynomial Kernel:

The decision boundary is a complex curve.
The polynomial kernel can capture non-linear relationships between features.

RBF (Radial Basis Function) Kernel:

The decision boundary is a smooth curve or set of curves.
The RBF kernel is effective for capturing complex, non-linear relationships.

Sigmoid Kernel:

The decision boundary is an S-shaped curve.
The sigmoid kernel can model non-linear relationships and is particularly useful when dealing with non-linearly separable data.

Additional Points

Each subplot also includes the scatter plot of the dataset points, where different colors represent different classes in the Iris dataset. The shaded regions in each subplot represent the regions assigned to different classes by the SVM classifier.
The title of each subplot indicates the type of kernel used, and the accuracy score is displayed to provide an indication of how well the SVM model is performing on the test set.
In summary, the output provides a visual representation of how different SVM kernels perform in classifying the Iris dataset. The decision boundaries and their complexities vary based on the chosen kernel function.

Overall, in order to pick a kernel in SVM, we must first understand the nature of the issue, which might be linear or nonlinear classification, anomaly detection, or regression.

Article Tags :

AI-ML-DS

Machine Learning

AI-ML-DS With Python

ML Algorithms