**Principal Component Analysis (PCA)** is an unsupervised dimensionality reduction and visualisation technique. It is often referred to as a linear technique because the mapping of new features is given by the multiplication of feature by the matrix of PCA eigenvectors. It works by simply identifying the hyperplane that lies close to the data and then projects the data onto it in order to maximize the variance. Due to the simplistic approach PCA follows, it is widely used in data mining, bioinformatics, psychology, etc. Most of us are unaware of the fact that there are various versions of this algorithm out there which are better than the conventional approach. Let’s look at them one by one.

**Randomized PCA:**

This is an extension to PCA which uses approximated Singular Value Decomposition(SVD) of data. Conventional PCA works in O(n*p^{2}) + O(p^{3}) where *n* is the number of data points and *p* is the number of features whereas randomized version works in O(n*d*2) + O(d^{3}) where d is the number of principal components. Thus, it is blazing fast when *d* is much smaller than *n*.

sklearn provides a method `randomized_svd`

in **sklearn.utils.extmath** which can be used to do randomized PCA. This method returns three matrices: *U* which is an *m x m* matrix, Sigma is an *m x n* diagonal matrix, and *V^T* is the transpose of an *n x n* matrix where T is a superscript. Another way to use **sklearn.decomposition.PCA** and change the `svd_solver`

hyperparameter from ‘auto’ to ‘randomized’ or ‘full’. However, Scikit-learn automatically uses randomized PCA if either *p* or *n* exceeds 500 or the number of principal components is less than 80% of *p* and *n*.

**Code:**

`# Python3 program to show the working of` `# randomized PCA` ` ` `# importing libraries` `import` `numpy as np` `from` `sklearn.decomposition ` `import` `PCA` `from` `sklearn.utils.extmath ` `import` `randomized_svd` ` ` `# dummy data` `X ` `=` `np.array([[` `-` `1` `, ` `-` `1` `], [` `-` `2` `, ` `-` `1` `], [` `-` `3` `, ` `-` `2` `], [` `1` `, ` `1` `], [` `2` `, ` `1` `], [` `3` `, ` `2` `]])` ` ` `# creates instance of PCA with randomized svd_solver` `pca ` `=` `PCA(n_components ` `=` `2` `, svd_solver ` `=` `'randomized'` `)` ` ` `# This function takes a matrix and returns the ` `# U, Sigma and V ^ T elements` `U, S, VT ` `=` `randomized_svd(X, n_components ` `=` `2` `)` ` ` `# matrix returned by randomized_svd` `print` `(f` `"Matrix U of size m * m: {U}\n"` `)` `print` `(f` `"Matrix S of size m * n: {S}\n"` `)` `print` `(f` `"Matrix V ^ T of size n * n: {VT}\n"` `)` ` ` `# fitting the pca model` `pca.fit(X)` ` ` `# printing the explained variance ratio` `print` `(` `"Explained Variance using PCA with randomized svd_solver:"` `, pca.explained_variance_ratio_)` |

**Output:**

Matrix U of size m*m: [[ 0.21956688 -0.53396977] [ 0.35264795 0.45713538] [ 0.57221483 -0.07683439] [-0.21956688 0.53396977] [-0.35264795 -0.45713538] [-0.57221483 0.07683439]] Matrix S of size m*n: [6.30061232 0.54980396] Matrix V^T of size n*n: [[-0.83849224 -0.54491354] [-0.54491354 0.83849224]] Explained Variance using PCA with randomized svd_solver: [0.99244289 0.00755711]

**Incremental PCA:**

The major problem with PCA and most of the dimensionality reduction algorithms is that they require whole data to fit in the memory at a single time and as the data is very huge at times thus it becomes very difficult to fit in memory.

Fortunately, there is an algorithm called Incremental PCA which is useful for large training datasets as it splits the data into min-batches and feeds it to Incremental PCA one batch at a time. This is called as on-the-fly learning. As not much data is present in the memory at a time thus memory usage is controlled.

Scikit-Learn provides us with a class called as `sklearn.decomposition.IncrementalPCA`

using which we can implement this.

**Code:**

`# Python3 program to show the working of` `# incremental PCA` ` ` `# importing libraries` `import` `numpy as np` `from` `sklearn.decomposition ` `import` `IncrementalPCA` ` ` `# dummy data` `X ` `=` `np.array([[` `-` `1` `, ` `-` `1` `], [` `-` `2` `, ` `-` `1` `], [` `-` `3` `, ` `-` `2` `], [` `1` `, ` `1` `], [` `2` `, ` `1` `], [` `3` `, ` `2` `]])` ` ` `# specify the number of batches` `no_of_batches ` `=` `3` ` ` `# create an instance of IncrementalPCA` `incremental_pca ` `=` `IncrementalPCA(n_components ` `=` `2` `)` ` ` `# fit the data in batches` `for` `batch ` `in` `np.array_split(X, no_of_batches):` ` ` `incremental_pca.fit(batch)` ` ` `# fit and tranform the data ` `final ` `=` `incremental_pca.transform(X)` ` ` `# prints an 2d-array (as n_components = 2)` `print` `(final)` |

**Output:**

[[-4.24264069e+00 7.07106781e-01] [-4.94974747e+00 1.41421356e+00] [-6.36396103e+00 1.41421356e+00] [-1.41421356e+00 7.07106781e-01] [-7.07106781e-01 -5.55111512e-17] [ 7.07106781e-01 5.55111512e-17]]

**Kernal PCA:**

Kernel PCA is yet another extension of PCA using a kernel. The kernel is a mathematical technique using which we can map instances to very high dimensional space called the feature space, enabling non-linear classification and regression with Support Vector Machines(SVM). This is usually employed in novelty detections and image de-noising.

Scikit-Learn provides a class KernelPCA in `sklearn.decomposition`

which can be used to perform Kernel PCA.

**Code:**

`# Python3 program to show the working of` `# Kernel PCA` ` ` `# importing libraries` `import` `numpy as np` `from` `sklearn.decomposition ` `import` `KernelPCA` ` ` `# dummy data` `X ` `=` `np.array([[` `-` `1` `, ` `-` `1` `], [` `-` `2` `, ` `-` `1` `], [` `-` `3` `, ` `-` `2` `], [` `1` `, ` `1` `], [` `2` `, ` `1` `], [` `3` `, ` `2` `]])` ` ` `# creating an instance of KernelPCA using rbf kernel` `kernel_pca ` `=` `KernelPCA(n_components ` `=` `2` `, kernel ` `=` `"rbf"` `, gamma ` `=` `0.03` `)` ` ` `# fit and transform the data` `final ` `=` `kernel_pca.fit_transform(X)` ` ` `# prints an 2d-array (as n_components = 2)` `print` `(final)` |

**Output:**

[[-0.3149893 -0.17944928] [-0.46965347 -0.0475298 ] [-0.62541667 0.22697909] [ 0.3149893 -0.17944928] [ 0.46965347 -0.0475298 ] [ 0.62541667 0.22697909]]

KernelPCA is unsupervised thus there is no obvious measure to select the best kernel. As we usually use dimensionality reduction as a step in supervised learning algorithms so we can use a pipeline with GridSearchCV for selecting optimal hyperparameters and then using those hyperparameters (kernel and gamma) to get the best classification accuracy.