**Prerequisite:** Support Vector Machines

**Definition of a hyperplane and SVM classifier:**

For a linearly separable dataset having n features (thereby needing n dimensions for representation), a hyperplane is basically an (n – 1) dimensional subspace used for separating the dataset into two sets, each set containing data points belonging to a different class. For example, for a dataset having two features X and Y (therefore lying in a 2-dimensional space), the separating hyperplane is a line (a 1-dimensional subspace). Similarly, for a dataset having 3-dimensions, we have a 2-dimensional separating hyperplane, and so on.

In machine learning, Support Vector Machine (SVM) is a non-probabilistic, linear, binary classifier used for classifying data by learning a hyperplane separating the data.

**Classifying a non-linearly separable dataset using a SVM – a linear classifier:**

As mentioned above SVM is a linear classifier which learns an (n – 1)-dimensional classifier for classification of data into two classes. However, it can be used for classifying a non-linear dataset. This can be done by projecting the dataset into a higher dimension in which it is linearly separable!

To get a better understanding, let’s consider circles dataset.

`# importing libraries ` `import` `numpy as np ` `import` `matplotlib.pyplot as plt ` `from` `sklearn.datasets ` `import` `make_circles ` `from` `mpl_toolkits.mplot3d ` `import` `Axes3D ` ` ` `# generating data ` `X, Y ` `=` `make_circles(n_samples ` `=` `500` `, noise ` `=` `0.02` `) ` ` ` `# visualizing data ` `plt.scatter(X[:, ` `0` `], X[:, ` `1` `], c ` `=` `Y, marker ` `=` `'.'` `) ` `plt.show() ` |

*chevron_right*

*filter_none*

The dataset is clearly a non-linear dataset and consists of two features (say, X and Y).

In order to use SVM for classifying this data, introduce another feature Z = X^{2} + Y^{2} into the dataset. Thus, projecting the 2-dimensional data into 3-dimensional space. The first dimension representing the feature X, second representing Y and third representing Z (which, mathematically, is equal to the radius of the circle of which the point (x, y) is a part of). Now, clearly, for the data shown above, the ‘yellow’ data points belong to a circle of smaller radius and the ‘purple’ data points belong to a circle of larger radius. Thus, the data becomes linearly separable along the Z-axis.

`# adding a new dimension to X ` `X1 ` `=` `X[:, ` `0` `].reshape((` `-` `1` `, ` `1` `)) ` `X2 ` `=` `X[:, ` `1` `].reshape((` `-` `1` `, ` `1` `)) ` `X3 ` `=` `(X1` `*` `*` `2` `+` `X2` `*` `*` `2` `) ` `X ` `=` `np.hstack((X, X3)) ` ` ` `# visualizing data in higher dimension ` `fig ` `=` `plt.figure() ` `axes ` `=` `fig.add_subplot(` `111` `, projection ` `=` `'3d'` `) ` `axes.scatter(X1, X2, X1` `*` `*` `2` `+` `X2` `*` `*` `2` `, c ` `=` `Y, depthshade ` `=` `True` `) ` `plt.show() ` |

*chevron_right*

*filter_none*

Now, we can use SVM (or, for that matter, any other linear classifier) to learn a 2-dimensional separating hyperplane. This is how the hyperplane would look like:

`# create support vector classifier using a linear kernel ` `from` `sklearn ` `import` `svm ` ` ` `svc ` `=` `svm.SVC(kernel ` `=` `'linear'` `) ` `svc.fit(X, Y) ` `w ` `=` `svc.coef_ ` `b ` `=` `svc.intercept_ ` ` ` `# plotting the separating hyperplane ` `x1 ` `=` `X[:, ` `0` `].reshape((` `-` `1` `, ` `1` `)) ` `x2 ` `=` `X[:, ` `1` `].reshape((` `-` `1` `, ` `1` `)) ` `x1, x2 ` `=` `np.meshgrid(x1, x2) ` `x3 ` `=` `-` `(w[` `0` `][` `0` `]` `*` `x1 ` `+` `w[` `0` `][` `1` `]` `*` `x2 ` `+` `b) ` `/` `w[` `0` `][` `2` `] ` ` ` `fig ` `=` `plt.figure() ` `axes2 ` `=` `fig.add_subplot(` `111` `, projection ` `=` `'3d'` `) ` `axes2.scatter(X1, X2, X1` `*` `*` `2` `+` `X2` `*` `*` `2` `, c ` `=` `Y, depthshade ` `=` `True` `) ` `axes1 ` `=` `fig.gca(projection ` `=` `'3d'` `) ` `axes1.plot_surface(x1, x2, x3, alpha ` `=` `0.01` `) ` `plt.show() ` |

*chevron_right*

*filter_none*

Thus, using a linear classifier we can separate a non-linearly separable dataset.

**A brief introduction to kernels in machine learning:**

In machine learning, a trick known as “kernel trick” is used to learn a linear classifier to classify a non-linear dataset. It transforms the linearly inseparable data into a linearly separable one by projecting it into a higher dimension. A kernel function is applied on each data instance to map the original non-linear data points into some higher dimensional space in which they become linearly separable.

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

## Recommended Posts:

- SVM Hyperparameter Tuning using GridSearchCV | ML
- ML | Non-Linear SVM
- Major Kernel Functions in Support Vector Machine (SVM)
- Introduction to Support Vector Machines (SVM)
- Pyspark | Linear regression with Advanced Feature Dataset using Apache MLlib
- Python - Basics of Pandas using Iris Dataset
- Image Caption Generator using Deep Learning on Flickr8K dataset
- Applying Convolutional Neural Network on mnist dataset
- Importing Kaggle dataset into google colaboratory
- Different dataset forms in Social Networks
- Python - Removing Constant Features From the Dataset
- Multiclass classification using scikit-learn
- Python | Image Classification using keras
- ML | Cancer cell classification using Scikit-learn
- Sentiment Classification Using BERT
- Getting started with Classification
- Regression and Classification | Supervised Machine Learning
- Basic Concept of Classification (Data Mining)
- ML | Classification vs Regression
- ML | Why Logistic Regression in Classification ?

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.