Skip to content
Related Articles

Related Articles

Improve Article

How to find the difference between two dataframes in R ?

  • Last Updated : 21 Apr, 2021

In this article, we will discuss how to find the difference between two data frames or compare two dataframes or data sets in R Programming Language. 

Method 1: Using Intersect function

Intersect function in R helps to get the common elements in the two datasets.

Syntax:

intersect(names(data_short), names(data_long))

Example:



R




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[intersect(names(first), names(second))]

Output:

    1                    2               3

1 0.562627228 0.9391250 0.6437934

2 0.003867576 0.7131200 0.9313777

3 0.129852760 0.2657934 0.9291285

4 0.325867139 0.2367633 0.1211350

Method 2: using setdiff()

This function unlike intersect helps to view the columns that are the missing in first dataframe. 



Syntax:

setdiff( dataframe2, dataframe 1)

Example:

R




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[setdiff(names(second), names(first))]

Output:

 d                  e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069



Method 3: Using colnames and dplyr

We will select from dplyr to get the columns of the dataframe on which some operations will be performed to get the desired difference between the two dataframes. 

Example:

R




library("dplyr")
  
first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second%>%select(which(!(colnames(second) %in% colnames(first))))

Output:

    d                e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069




My Personal Notes arrow_drop_up
Recommended Articles
Page :