Principal Component analysis (PCA): PCA is an unsupervised linear dimensionality reduction and data visualization technique for very high dimensional data. As having high dimensional data is very hard to gain insights from adding to that, it is very computationally intensive. The main idea behind this technique is to reduce the dimensionality of data that is highly correlated by transforming the original set of vectors to a new set which is known as Principal component.
PCA tries to preserve the Global Structure of data i.e when converting d-dimensional data to d’-dimensional data then it tries to map all the clusters as a whole due to which local structures might get lost. Application of this technique includes Noise filtering, feature extractions, stock market predictions, and gene data analysis.
t-distributed stochastic neighbourhood embedding (t-SNE): t-SNE is also a unsupervised non-linear dimensionality reduction and data visualization technique. The math behind t-SNE is quite complex but the idea is simple. It embeds the points from a higher dimension to a lower dimension trying to preserve the neighborhood of that point.
Unlike PCA it tries to preserve the Local structure of data by minimizing the Kullback–Leibler divergence (KL divergence) between the two distributions with respect to the locations of the points in the map. This technique finds application in computer security research, music analysis, cancer research, bioinformatics, and biomedical signal processing.
Table of Difference between PCA and t-SNE
|1.||It is a linear Dimensionality reduction technique.||It is a non-linear Dimensionality reduction technique.|
|2.||It tries to preserve the global structure of the data.||It tries to preserve the local structure(cluster) of data.|
|3.||It does not work well as compared to t-SNE.||It is one of the best dimensionality reduction technique.|
|4.||It does not involve Hyperparameters.||It involves Hyperparameters such as perplexity, learning rate and number of steps.|
|5.||It gets highly affected by outliers.||It can handle outliers.|
|6.||PCA is a deterministic algorithm.||It is a non-deterministic or randomised algorithm.|
|7.||It works by rotating the vectors for preserving variance.||It works by minimising the distance between the point in a guassian.|
|8.||We can find decide on how much variance to preserve using eigen values.|| We cannot preserve variance instead we can preserve distance using hyperparameters.|
- Difference between GSM and LTE
- Difference between RPC and RMI
- Difference between SDN and NFV
- Difference between DFA and NFA
- Difference between DVD-R and DVD-RW
- Difference between CLI and GUI
- Difference between Blu-ray and DVD
- Difference between DML and TCL
- Difference between IBM DB2 and MS SQL
- Difference between AIX and QNX
- Difference between URL and URI
- Difference Between Gi-Fi and Li-Fi
- Difference between WCF and Web API
- Difference between MAN and WAN
- Difference between ELT and ETL
- Difference between Blu-ray and HD DVD
- Difference between AIX and HP-UX
- Difference between AIX and IBM i
- Difference between ADO and ADO.NET
- Difference between 4NF and 5NF
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.