Open In App

How To Make t-SNE plot in R

Last Updated : 12 Jan, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

tSNE in an acronym for t-Distributed Neighbor Embedding is a statistical method that is mainly used to visualize high-dimensional data. In R Programming tSNE plots can be plotted using Rtsne and ggplot2 packages. 

Syntax: Rtsne(x, dims, theta, pca, verbose, perplexity)

where,

  • x – Data Matrix that needs to be plotted is specified here.
  • dims- used to specify the dimensions of the plot
  • theta – Speed/Accuracy trade off the plot(default -0.5)
  • pca – PCA setup is specified here (TRUE by default)
  • verbose – to print the progress updates need to set this to TRUE
  • perplexity  – state of confusion among data (should be less than 3)

The steps to Plot the tSNE plot in R are : 

  1. First we need to install and load all the required packages.
  2. Load the default dataset iris to plot the tSNE for that.
  3. Remove all the duplicates from the dataset
  4. Calculating relation in dataset
  5. Plotting the tSNE plot

Installing Modules

In this article, Rtsne and ggplot2 are modules required.

R




# Install all the required packages
install.packages("Rtsne")
install.packages("ggplot2")
 
# Load the required packages
library(Rtsne)
library(ggplot2)


Loading Dataset

We will be using iris dataset.

R




# Load the default dataset
data(iris)


Removing duplicates from the dataset

We need to remove all the duplicates from the dataset otherwise the Rtsne() function will raise error as it uses t-Distribution at the backend no duplicates are allowed and need to convert the data frame to Matrix to pass input to Rtsne() function.

R




# Remove Duplicate data present in iris
# data set(Otherwise Error will be generated)
remove_iris_dup <- unique(iris)
 
# Forming the matrix for the first four columns
# of iris dataset because fifth column is of string type(Species)
iris_matrix <- as.matrix(remove_iris_dup[,1:4])


Calculating relation in dataset

Using Rtsne() function we will calculate the similarities and differences between data of iris dataset

R




# Calculate tSNE using Rtsne(0 function)
tsne_out <- Rtsne(iris_matrix)


Plotting tSNE() plot 

Finally, we are going to plot the tSNE plot using ggplot() function but the ggplot function accepts only data frame as input we need to convert the obtained matrix from Rtsne function(i,e tnse_out)

R




# Conversion of matrix to dataframe
tsne_plot <- data.frame(x = tsne_out$Y[,1],
                        y = tsne_out$Y[,2])
 
# Plotting the plot using ggplot() function
ggplot2::ggplot(tsne_plot,label=Species)
                + geom_point(aes(x=x,y=y))


Output:

Output

 



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads