Open In App

How to find the difference between two dataframes in R ?

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to find the difference between two data frames or compare two dataframes or data sets in R Programming Language. 

Method 1: Using Intersect function

Intersect function in R helps to get the common elements in the two datasets.

Syntax:

intersect(names(data_short), names(data_long))

Example:

R




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[intersect(names(first), names(second))]


Output:

    1                    2               3

1 0.562627228 0.9391250 0.6437934

2 0.003867576 0.7131200 0.9313777

3 0.129852760 0.2657934 0.9291285

4 0.325867139 0.2367633 0.1211350

Method 2: using setdiff()

This function unlike intersect helps to view the columns that are the missing in first dataframe. 

Syntax:

setdiff( dataframe2, dataframe 1)

Example:

R




first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second[setdiff(names(second), names(first))]


Output:

 d                  e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069

Method 3: Using colnames and dplyr

We will select from dplyr to get the columns of the dataframe on which some operations will be performed to get the desired difference between the two dataframes. 

Example:

R




library("dplyr")
  
first <-
       data.frame(
             "1" = c('0.44','0.554','0.67','0.64'),
             "2" = c('0.124','0.22','0.82','0.994'),
             "3" = c('0.82','1.22','0.73','1.23')
         )
  
second <-
     data.frame(
            "1" = runif(4),
             "2" = runif(4),
             "3" = runif(4),
             "d" = runif(4),
             "e" = runif(4)
         )
  
second%>%select(which(!(colnames(second) %in% colnames(first))))


Output:

    d                e

1 0.7899783 0.04363003

2 0.9167861 0.39865991

3 0.3314494 0.13963663

4 0.7005957 0.73401069



Last Updated : 21 Apr, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads