Difference between PCA VS t-SNE

Last Updated : 16 Apr, 2023

Principal Component analysis (PCA): PCA is an unsupervised linear dimensionality reduction and data visualization technique for very high dimensional data. As having high dimensional data is very hard to gain insights from adding to that, it is very computationally intensive. The main idea behind this technique is to reduce the dimensionality of data that is highly correlated by transforming the original set of vectors to a new set which is known as Principal component. PCA tries to preserve the Global Structure of data i.e when converting d-dimensional data to d’-dimensional data then it tries to map all the clusters as a whole due to which local structures might get lost. Application of this technique includes Noise filtering, feature extractions, stock market predictions, and gene data analysis. t-distributed stochastic neighbourhood embedding (t-SNE): t-SNE is also a unsupervised non-linear dimensionality reduction and data visualization technique. The math behind t-SNE is quite complex but the idea is simple. It embeds the points from a higher dimension to a lower dimension trying to preserve the neighborhood of that point. Unlike PCA it tries to preserve the Local structure of data by minimizing the Kullback–Leibler divergence (KL divergence) between the two distributions with respect to the locations of the points in the map. This technique finds application in computer security research, music analysis, cancer research, bioinformatics, and biomedical signal processing.

Table of Difference between PCA and t-SNE

.math-table { border-collapse: collapse; width: 100%; } .math-table td { border: 1px solid #5fb962; text-align: left !important; padding: 8px; } .math-table th { border: 1px solid #5fb962; padding: 8px; } .math-table tr>th{ background-color: #c6ebd9; vertical-align: middle; } .math-table tr:nth-child(odd) { background-color: #ffffff; }

S.NO.	PCA	t-SNE
1.	It is a linear Dimensionality reduction technique.	It is a non-linear Dimensionality reduction technique.
2.	It tries to preserve the global structure of the data.	It tries to preserve the local structure(cluster) of data.
3.	It does not work well as compared to t-SNE.	It is one of the best dimensionality reduction technique.
4.	It does not involve Hyperparameters.	It involves Hyperparameters such as perplexity, learning rate and number of steps.
5.	It gets highly affected by outliers.	It can handle outliers.
6.	PCA is a deterministic algorithm.	It is a non-deterministic or randomised algorithm.
7.	It works by rotating the vectors for preserving variance.	It works by minimising the distance between the point in a gaussian.
8.	We can find decide on how much variance to preserve using eigen values.	We cannot preserve variance instead we can preserve distance using hyperparameters.
9.	PCA is computationally less expensive than t-SNE, especially for large datasets.	t-SNE can be computationally expensive, especially for high-dimensional datasets with a large number of data points.
10.	It can be used for visualization of high-dimensional data in a low-dimensional space.	It is specifically designed for visualization and is known to perform better in this regard.
11.	It is suitable for linearly separable datasets.	It is more suitable for non-linearly separable datasets.
12.	It can be used for feature extraction	It is mainly used for visualization and exploratory data analysis.
13.	PCA can be sensitive to the ordering of the data points	t-SNE is less sensitive to the ordering of the data points.