Prerequisite: Support Vector Machines
Definition of a hyperplane and SVM classifier:
For a linearly separable dataset having n features (thereby needing n dimensions for representation), a hyperplane is basically an (n – 1) dimensional subspace used for separating the dataset into two sets, each set containing data points belonging to a different class. For example, for a dataset having two features X and Y (therefore lying in a 2-dimensional space), the separating hyperplane is a line (a 1-dimensional subspace). Similarly, for a dataset having 3-dimensions, we have a 2-dimensional separating hyperplane, and so on.
In machine learning, Support Vector Machine (SVM) is a non-probabilistic, linear, binary classifier used for classifying data by learning a hyperplane separating the data.
Classifying a non-linearly separable dataset using a SVM – a linear classifier:
As mentioned above SVM is a linear classifier which learns an (n – 1)-dimensional classifier for classification of data into two classes. However, it can be used for classifying a non-linear dataset. This can be done by projecting the dataset into a higher dimension in which it is linearly separable!
To get a better understanding, let’s consider circles dataset.
The dataset is clearly a non-linear dataset and consists of two features (say, X and Y).
In order to use SVM for classifying this data, introduce another feature Z = X2 + Y2 into the dataset. Thus, projecting the 2-dimensional data into 3-dimensional space. The first dimension representing the feature X, second representing Y and third representing Z (which, mathematically, is equal to the radius of the circle of which the point (x, y) is a part of). Now, clearly, for the data shown above, the ‘yellow’ data points belong to a circle of smaller radius and the ‘purple’ data points belong to a circle of larger radius. Thus, the data becomes linearly separable along the Z-axis.
Now, we can use SVM (or, for that matter, any other linear classifier) to learn a 2-dimensional separating hyperplane. This is how the hyperplane would look like:
Thus, using a linear classifier we can separate a non-linearly separable dataset.
A brief introduction to kernels in machine learning:
In machine learning, a trick known as “kernel trick” is used to learn a linear classifier to classify a non-linear dataset. It transforms the linearly inseparable data into a linearly separable one by projecting it into a higher dimension. A kernel function is applied on each data instance to map the original non-linear data points into some higher dimensional space in which they become linearly separable.
- Applying Convolutional Neural Network on mnist dataset
- Pyspark | Linear regression with Advanced Feature Dataset using Apache MLlib
- ML | Classification vs Regression
- Getting started with Classification
- ML | Classification vs Clustering
- ML | Why Logistic Regression in Classification ?
- Python | Image Classification using keras
- Multiclass classification using scikit-learn
- Regression and Classification | Supervised Machine Learning
- ML | Logistic Regression v/s Decision Tree Classification
- Basic Concept of Classification (Data Mining)
- ML | Cancer cell classification using Scikit-learn
- Switch Your Career to Machine Learning - A Complete Guide
- How Should a Machine Learning Beginner Get Started on Kaggle?
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.