How to Make Better Models in Python using SVM Classifier and RBF Kernel

Last Updated : 30 Jan, 2023

As machine learning models continue to become more popular and widespread, it is important for data scientists and developers to understand how to build the best models possible. One powerful tool that can be used to improve the accuracy and performance of machine learning models is the support vector machine (SVM) classifier, which is a type of linear classifier that works well for a variety of different data types. In this article, we will focus on how to use the SVM classifier and the radial basis function (RBF) kernel in Python to build better models for your data.

A support vector machine is a type of supervised learning algorithm that can be used for classification or regression tasks. It works by finding the hyperplane in a high-dimensional space that maximally separates the different classes in the data. The points closest to the hyperplane called support vectors to have the greatest influence on the position of the hyperplane and the classification of new data points. SVM can be used for both linear and non-linear classification problems by using different types of Kernels.

RBF Kernel in SVM

The RBF kernel is a type of kernel function that can be used with the SVM classifier to transform the data into a higher-dimensional space, where it is easier to find a separation boundary. The RBF kernel is defined by a single parameter, gamma, which determines the width of the kernel and therefore the complexity of the model. The RBF kernel function is defined as:

K(x, y) = exp(-gamma * ||x-y||^2)

The value of gamma controls the width of the kernel and thus the complexity of the model. A small gamma value will result in a wide kernel, leading to a simpler model with low variance and high bias, while a large gamma value will result in a narrow kernel, leading to a more complex model with high variance and low bias.

The other important hyperparameter is C, which controls the trade-off between maximizing the margin and minimizing the misclassification error. A large value of C will result in a smaller margin and fewer misclassifications, while a small value of C will result in a larger margin and more misclassifications.

Now that we have a basic understanding of the SVM classifier and the RBF kernel, let’s go through the steps for using these tools in Python to build a model using a toy dataset.

Importing Libraries and Dataset

First, you will need to load your data into a Pandas dataframe and prepare it for modeling. This may include tasks such as splitting the data into training and testing sets, standardizing the features, and handling missing values.

Python3

from sklearn.preprocessing import StandardScaler 
from sklearn.model_selection import train_test_split 
import pandas as pd 
from sklearn import datasets 
import numpy as np 
  
# load toy dataset 
iris = datasets.load_iris() 
iris_df = pd.DataFrame(data=np.c_[iris['data'], 
                                  iris['target']], 
                       columns=iris['feature_names'] + ['target']) 
iris_df = iris_df[iris_df["target"] != 2] 
iris_df["target"] = iris_df["target"].\ 
    apply(lambda x: 0 if x == 1 else 1) 

Now we will split the complete dataset into training and the testing part so, that we can train the model using the training dataset and then use the leftover dataset for the evaluation part.

Python3

# Split data into training and testing sets 
X_train, X_test,\ 
    y_train, y_test = train_test_split(iris_df.drop('target', 
                                                    axis=1), 
                                       iris_df['target'], 
                                       test_size=0.2) 
  
# Standardize features 
scaler = StandardScaler().fit(X_train) 
X_train_scaled = scaler.transform(X_train) 
X_test_scaled = scaler.transform(X_test) 

Model Training

Depending on the characteristics of your data, you may want to use a different kernel with the SVM classifier. In this case, we will be using the RBF kernel, which is well-suited for data that is not linearly separable. Once you have prepared your data and chosen the appropriate kernel, you can use the scikit-learn library to fit the SVM model to your data. This is done using the fit() method, which takes in the training data and labels it as arguments. You can also set the values of C and gamma here.

Python3

from sklearn.svm import SVC 
  
# Create an SVM classifier with an RBF 
# kernel and set values of C and gamma 
model = SVC(kernel='rbf', C=1, gamma=1) 
  
# Fit the model to the training data 
model.fit(X_train_scaled, y_train) 

Model Evaluation

After fitting the model to the training data, it is important to evaluate its performance on the testing data. This can be done using a variety of metrics, such as accuracy, precision, and recall.

Python3

# Calculate the accuracy of the model on the test data 
from sklearn.metrics import accuracy_score 
y_pred = model.predict(X_test_scaled) 
accuracy = accuracy_score(y_test, y_pred) 
print('Accuracy:', accuracy) 

Now that you have trained and evaluated your model, you can use it to make predictions on new data. You can do this using the predict() method, which takes in a matrix of data and returns a corresponding array of predictions.

Python3

# Make predictions on new data 
new_data = ... # new data that you want to predict on 
new_data_scaled = scaler.transform(new_data) 
predictions = model.predict(new_data_scaled) 

Hyper Parameter Tuning using GridSearchCV

Depending on the results of your model evaluation, you may want to fine-tune the model by adjusting the hyperparameters or using a different kernel. For example, you can use the GridSearchCV function from the scikit-learn library to perform a grid search over different combinations of hyperparameters and choose the best-performing model.

Python3

from sklearn.model_selection import GridSearchCV 
  
# Define the parameter grid 
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001]} 
  
# Create a grid search object 
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=2) 
  
# Fit the grid search object to the training data 
grid.fit(X_train_scaled, y_train) 
  
# Get the best parameters 
best_params = grid.best_params_ 

With these steps, you can use the SVM classifier and the RBF kernel in Python to build better models for your data. It’s important to keep in mind that the choice of kernel and the value of the hyperparameters can have a significant impact on the performance of the model and should be chosen carefully based on the characteristics of your data.

Suggest improvement

ML | Using SVM to perform classification on a non-linear dataset

Share your thoughts in the comments

How to Make Better Models in Python using SVM Classifier and RBF Kernel

RBF Kernel in SVM

Importing Libraries and Dataset

Python3

Python3

Model Training

Python3

Model Evaluation

Python3

Python3

Hyper Parameter Tuning using GridSearchCV

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?