RBF SVM Parameters in Scikit Learn

Last Updated : 31 Jul, 2023

Scikit Learn is a popular machine-learning library in Python, and it provides a powerful implementation of Support Vector Machines (SVMs) with the Radial Basis Function (RBF) kernel. RBF kernel is a popular choice for SVM because it can handle non-linear decision boundaries, making it suitable for a wide range of classification tasks.

When using RBF SVM in Scikit Learn, there are several important parameters that can be tuned to optimize the performance of the model. The most important parameters are C and gamma. The C parameter controls the trade-off between achieving a low training error and a low testing error, with a high value of C indicating a preference for a low training error over a low testing error. The gamma parameter controls the shape of the decision boundary, with a high value of gamma resulting in a more complex and wiggly boundary.

Other important parameters include kernel, degree, and coef0. The kernel parameter determines the type of kernel function used for the SVM, with RBF being the default choice. The degree parameter is used when the kernel is set to polynomial, and it controls the degree of the polynomial function. The coef0 parameter is used when the kernel is set to polynomial or sigmoid, and it controls the independent term in the kernel function.

To find the optimal values for these parameters, a grid search or randomized search can be performed over a range of values. Cross-validation can also be used to evaluate the performance of the model for different parameter values. It is important to note that selecting the right combination of parameters is a crucial step in building an accurate and robust SVM model with the RBF kernel.

Concepts related to the SVM:

Support Vector Machines (SVMs): SVMs are supervised machine learning models used for classification and regression tasks. They are widely used due to their ability to handle high-dimensional data and non-linearly separable datasets.
Radial Basis Function (RBF) kernel: The RBF kernel is a commonly used kernel function in SVMs. It is used to transform the input data into a higher-dimensional space to find a hyperplane that separates the data points.
C parameter: The C parameter in SVMs is a hyperparameter that determines the trade-off between achieving a low training error and a low testing error. It controls the level of misclassification that can be tolerated in the training dataset.
Gamma parameter: The gamma parameter in SVMs is a hyperparameter that controls the shape of the decision boundary. It determines the flexibility of the model and the level of overfitting or underfitting of the training data.
Regularization: Regularization is a technique used to prevent overfitting in machine learning models. It involves adding a penalty term to the objective function, which discourages the model from fitting the noise in the training data.
GridSearchCV: GridSearchCV is a technique used to find the optimal hyperparameters for a machine learning model. It involves exhaustively searching through a pre-defined range of hyperparameter values to find the combination that gives the best performance.
RandomizedSearchCV: RandomizedSearchCV is a technique used to find the optimal hyperparameters for a machine learning model. It involves randomly selecting hyperparameters from a pre-defined range of values and evaluating their performance to find the best combination.

Radial Basis Function

The RBF (Radial Basis Function) kernel function is a popular kernel function used in SVM (Support Vector Machine) classification algorithms. It is widely used for its ability to handle non-linearly separable data by mapping the data to higher dimensions.

The RBF kernel function is defined as:

K(x, x_i) = exp(-gamma * ||x – x_i||²)

where x and x_i are input vectors, ||x – x_i||² is the squared Euclidean distance between the two vectors, and gamma is the kernel parameter. The gamma parameter determines the influence of each training example on the decision boundary. A smaller value of gamma will cause a wider curve of the decision boundary, while a larger value of gamma will cause the curve to be narrower and more focused on individual data points.

The RBF kernel function has several advantages. It is a universal kernel, meaning that it can approximate any continuous function to arbitrary precision given sufficient data. It is also computationally efficient because it can be implemented using only inner products, and is relatively simple to use because it only requires the choice of a single parameter, gamma.

However, the RBF kernel function can be sensitive to the choice of gamma. If gamma is too small, the model may underfit the data, while if gamma is too large, the model may overfit the data. Therefore, it is important to carefully choose the value of gamma based on the specific dataset and problem at hand.

Significance: In Support Vector Machines (SVM), the kernel function plays a vital role in the classification of data. The kernel function maps the original input data into a higher dimensional feature space where it becomes easier to classify the data into different classes using a linear decision boundary. This is called the kernel trick.

The significance of the kernel function is that it allows SVM to classify non-linearly separable data by mapping it into a higher dimensional space where it becomes linearly separable. SVM works by finding the maximum margin hyperplane that separates the data into different classes. The kernel function helps to project the data into a higher dimensional space where this maximum margin hyperplane can be found using a linear classifier.

There are different types of kernel functions in SVM such as linear, polynomial, sigmoid, and radial basis functions (RBF). Each kernel function has its own characteristics and is used for different types of data. The RBF kernel is particularly popular because it can handle complex data distributions and can model non-linear decision boundaries effectively.

Overall, the kernel function in SVM is a powerful tool that allows SVM to classify complex data distributions and achieve high classification accuracy. It helps SVM to overcome the limitations of linear classifiers and to handle non-linearly separable data effectively.

Here are the general steps needed to tune RBF SVM parameters in Scikit Learn:

Step 1: Import the necessary libraries: First, import the required libraries, including Scikit Learn, Numpy, and Pandas.

Step 2: Load and preprocess the data: Load the dataset you want to use for training and testing the SVM model. Preprocess the data as necessary, including feature scaling, feature selection, and data normalization.

Step 3: Split the data into training and testing sets: Divide the data into two sets: a training set and a testing set. The training set is used to train the model, while the testing set is used to evaluate the model’s performance.

Step 4: Define the parameter grid: Define a grid of hyperparameter values to search over. Specify the range of values for the C and gamma parameters, and the method to generate parameter combinations.

Step 5: Perform grid search or randomized search: Use the GridSearchCV or RandomizedSearchCV functions to perform hyperparameter tuning. These functions fit the SVM model on the training data for each combination of hyperparameters in the grid and then evaluate the model’s performance using cross-validation.

Step 6: Repeat if necessary: If the performance of the model is not satisfactory, repeat the tuning process with different hyperparameter values until you obtain the desired level of performance.

Overall, tuning RBF SVM parameters involves selecting appropriate values for the C and gamma parameters through a systematic search process. This can be achieved using various techniques such as GridSearchCV or RandomizedSearchCV, and involves evaluating the performance of the model on a separate testing set.

Implementations

Python3

# Import the necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
import seaborn as sns
 
# load and preprocess the data
data = load_iris()
X = data.data
y = data.target
 
# Scale the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
 
# define the parameter grid
param_grid = {'C': [0.001, 0.01, 0.1, 1, 10, 100, 1000],
              'gamma':[0.0001, 0.001, 0.01, 1, 10, 100, 1000]}
 
# perform grid search
svm = SVC(kernel='rbf')
grid_search = GridSearchCV(svm, 
                           param_grid, 
                           cv=3, 
                           n_jobs=-1)
grid_search.fit(X_scaled, y)
 
print(
    "Best parameters are {} \nScore : {}%".format(
        grid_search.best_params_, grid_search.best_score_*100)
)
 
# Reshape for heatmap
scores = grid_search.cv_results_["mean_test_score"].reshape(
    len(param_grid['gamma']),
    len(param_grid['C']))
 
# Heatmap
sns.heatmap(scores, 
            cmap = plt.cm.hot,
            annot= True, 
            cbar= True, 
            square=True)
 
plt.xlabel("gamma")
plt.ylabel("C")
plt.xticks(np.arange(len(param_grid['gamma'])), param_grid['gamma'], rotation=45)
plt.yticks(np.arange(len(param_grid['C'])), param_grid['C'], rotation=0)
 
plt.title("Accuracy for different parameters")
plt.show()
 
## Plot accuracy vs C parameter
plt.figure(figsize=(10, 6))
plt.title("Accuracy vs C parameter")
plt.xlabel("C")
plt.ylabel("Accuracy")
n = len(param_grid['C'])
for i in range(n):
    plt.plot(param_grid['C'], 
             scores[:,i], 
             'o-', label='gamma='+str(param_grid['gamma'][i]))
 
plt.legend()
plt.xscale('log')
plt.show()

Output:

Best parameters are {'C': 100, 'gamma': 0.001} 
Score : 98.0%

C vs Gamma w.r.t accuracy

Accuracy VS C parameters

The resulting heatmap will show how the classification accuracy changes as we vary the C and gamma values. We can use this visualization to determine the optimal hyperparameters for our SVM model.

Loop through the parameter values and train an SVM for each combination

Python3

# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled,
                                                    y, test_size=0.2,
                                                    random_state=23)
# define the parameter grid
param_grid = {'C': [0.01, 1, 10, 100],
              'gamma': [0.001, 0.01, 0.1, 10]}
 
 
# Create a figure to plot the results
fig, axs = plt.subplots(len(param_grid['C']),
                        len(param_grid['gamma']),
                        figsize=(16,16),
                        sharey=True)
 
# Loop through the parameter values and train an SVM for each combination
for i, C in enumerate(param_grid['C']):
    for j, gamma in enumerate(param_grid['gamma']):
        clf = SVC(kernel='rbf', C=C, gamma=gamma)
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        axs[i,j].scatter(X_test[:,0], X_test[:,1], c=y_pred)
        axs[i,j].set_xticks(())
        axs[i,j].set_yticks(())
        axs[i,j].set_title('C = {}, gamma = {}\nAccuracy = {:.2f}'.format(
            C, gamma, accuracy))
 
plt.show()

Output:

Accuracy for Different C and gamma

Note: The accuracy may vary slightly due to the random train/test split and random grid search, but the general trend should remain the same.

This code performs a grid search to find the best combination of parameters (C and gamma) for an SVM model with an RBF kernel on the iris Data Set. It then plots the accuracy of the model against the different values of C and gamma to show how the parameters affect the result. The plot shows how accuracy changes with C for different values of gamma.

Suggest improvement

Swiss Roll Reduction with LLE in Scikit Learn

Share your thoughts in the comments