# ML | Using SVM to perform classification on a non-linear dataset

Prerequisite: Support Vector Machines

Definition of a hyperplane and SVM classifier:
For a linearly separable dataset having n features (thereby needing n dimensions for representation), a hyperplane is basically an (n – 1) dimensional subspace used for separating the dataset into two sets, each set containing data points belonging to a different class. For example, for a dataset having two features X and Y (therefore lying in a 2-dimensional space), the separating hyperplane is a line (a 1-dimensional subspace). Similarly, for a dataset having 3-dimensions, we have a 2-dimensional separating hyperplane, and so on.
In machine learning, Support Vector Machine (SVM) is a non-probabilistic, linear, binary classifier used for classifying data by learning a hyperplane separating the data.

Classifying a non-linearly separable dataset using a SVM – a linear classifier:
As mentioned above SVM is a linear classifier which learns an (n – 1)-dimensional classifier for classification of data into two classes. However, it can be used for classifying a non-linear dataset. This can be done by projecting the dataset into a higher dimension in which it is linearly separable!

To get a better understanding, let’s consider circles dataset.

 `# importing libraries ` `import` `numpy as np ` `import` `matplotlib.pyplot as plt ` `from` `sklearn.datasets ``import` `make_circles ` `from` `mpl_toolkits.mplot3d ``import` `Axes3D ` ` `  `# generating data ` `X, Y ``=` `make_circles(n_samples ``=` `500``, noise ``=` `0.02``) ` ` `  `# visualizing data ` `plt.scatter(X[:, ``0``], X[:, ``1``], c ``=` `Y, marker ``=` `'.'``) ` `plt.show() `

The dataset is clearly a non-linear dataset and consists of two features (say, X and Y).

In order to use SVM for classifying this data, introduce another feature Z = X2 + Y2 into the dataset. Thus, projecting the 2-dimensional data into 3-dimensional space. The first dimension representing the feature X, second representing Y and third representing Z (which, mathematically, is equal to the radius of the circle of which the point (x, y) is a part of). Now, clearly, for the data shown above, the ‘yellow’ data points belong to a circle of smaller radius and the ‘purple’ data points belong to a circle of larger radius. Thus, the data becomes linearly separable along the Z-axis.

 `# adding a new dimension to X ` `X1 ``=` `X[:, ``0``].reshape((``-``1``, ``1``)) ` `X2 ``=` `X[:, ``1``].reshape((``-``1``, ``1``)) ` `X3 ``=` `(X1``*``*``2` `+` `X2``*``*``2``) ` `X ``=` `np.hstack((X, X3)) ` ` `  `# visualizing data in higher dimension ` `fig ``=` `plt.figure() ` `axes ``=` `fig.add_subplot(``111``, projection ``=` `'3d'``) ` `axes.scatter(X1, X2, X1``*``*``2` `+` `X2``*``*``2``, c ``=` `Y, depthshade ``=` `True``) ` `plt.show() `

Now, we can use SVM (or, for that matter, any other linear classifier) to learn a 2-dimensional separating hyperplane. This is how the hyperplane would look like:

 `# create support vector classifier using a linear kernel ` `from` `sklearn ``import` `svm ` ` `  `svc ``=` `svm.SVC(kernel ``=` `'linear'``) ` `svc.fit(X, Y) ` `w ``=` `svc.coef_ ` `b ``=` `svc.intercept_ ` ` `  `# plotting the separating hyperplane ` `x1 ``=` `X[:, ``0``].reshape((``-``1``, ``1``)) ` `x2 ``=` `X[:, ``1``].reshape((``-``1``, ``1``)) ` `x1, x2 ``=` `np.meshgrid(x1, x2) ` `x3 ``=` `-``(w[``0``][``0``]``*``x1 ``+` `w[``0``][``1``]``*``x2 ``+` `b) ``/` `w[``0``][``2``] ` ` `  `fig ``=` `plt.figure() ` `axes2 ``=` `fig.add_subplot(``111``, projection ``=` `'3d'``) ` `axes2.scatter(X1, X2, X1``*``*``2` `+` `X2``*``*``2``, c ``=` `Y, depthshade ``=` `True``) ` `axes1 ``=` `fig.gca(projection ``=` `'3d'``) ` `axes1.plot_surface(x1, x2, x3, alpha ``=` `0.01``) ` `plt.show() `

Thus, using a linear classifier we can separate a non-linearly separable dataset.

A brief introduction to kernels in machine learning:
In machine learning, a trick known as “kernel trick” is used to learn a linear classifier to classify a non-linear dataset. It transforms the linearly inseparable data into a linearly separable one by projecting it into a higher dimension. A kernel function is applied on each data instance to map the original non-linear data points into some higher dimensional space in which they become linearly separable.

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Previous
Next