Skip to content
Related Articles

Related Articles

How to find the difference between two dataframes in R ?

View Discussion
Improve Article
Save Article
  • Last Updated : 21 Apr, 2021
View Discussion
Improve Article
Save Article

In this article, we will discuss how to find the difference between two data frames or compare two dataframes or data sets in R Programming Language. 

Method 1: Using Intersect function

Intersect function in R helps to get the common elements in the two datasets.

Syntax:

intersect(names(data_short), names(data_long))

Example:

R




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[intersect(names(first), names(second))]

Output:

    1                    2               3

1 0.562627228 0.9391250 0.6437934

2 0.003867576 0.7131200 0.9313777

3 0.129852760 0.2657934 0.9291285

4 0.325867139 0.2367633 0.1211350

Method 2: using setdiff()

This function unlike intersect helps to view the columns that are the missing in first dataframe. 

Syntax:

setdiff( dataframe2, dataframe 1)

Example:

R




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[setdiff(names(second), names(first))]

Output:

 d                  e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069

Method 3: Using colnames and dplyr

We will select from dplyr to get the columns of the dataframe on which some operations will be performed to get the desired difference between the two dataframes. 

Example:

R




library("dplyr")
  
first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second%>%select(which(!(colnames(second) %in% colnames(first))))

Output:

    d                e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!