Open In App

How to find the difference between two dataframes in R ?

In this article, we will discuss how to find the difference between two data frames or compare two dataframes or data sets in R Programming Language. 

Method 1: Using Intersect function

Intersect function in R helps to get the common elements in the two datasets.



Syntax:

intersect(names(data_short), names(data_long))



Example:




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[intersect(names(first), names(second))]

Output:

    1                    2               3

1 0.562627228 0.9391250 0.6437934

2 0.003867576 0.7131200 0.9313777

3 0.129852760 0.2657934 0.9291285

4 0.325867139 0.2367633 0.1211350

Method 2: using setdiff()

This function unlike intersect helps to view the columns that are the missing in first dataframe. 

Syntax:

setdiff( dataframe2, dataframe 1)

Example:




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[setdiff(names(second), names(first))]

Output:

 d                  e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069

Method 3: Using colnames and dplyr

We will select from dplyr to get the columns of the dataframe on which some operations will be performed to get the desired difference between the two dataframes. 

Example:




library("dplyr")
  
first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second%>%select(which(!(colnames(second) %in% colnames(first))))

Output:

    d                e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069


Article Tags :