Skip to content
Related Articles

Related Articles

Improve Article

Calculate difference between dataframe rows by group in R

  • Last Updated : 29 Jun, 2021

In this article, we will see how to find the difference between rows by the group in dataframe in R programming language.

Method 1: Using dplyr package

The group_by method is used to divide and segregate date based on groups contained within the specific columns. The required column to group by is specified as an argument of this function. It may contain multiple column names. 

Syntax:

group_by(col1, col2, …)

This is followed by the application of mutate() method which is used to shift orientations and perform manipulations in the data. The new column name can be specified using the new column name. The difference from the previous row can be calculated using the lag() method of this library. This method finds the previous values in a vector. 



Syntax:

lag(x, n = 1L, default = NA)

Parameter: 

  • x  – A vector of values
  • n – Number of positions to lag by
  • default (Default : NA)- the value used for non-existent rows. 

A mutation is introduced in the data frame by using the lag of the column value subtracted from the specified column’s particular row. The default value is the first value of that particular group using the first(col-name).

Example:

R




# installing required libraries
library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = sample(6:9, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,NA,NA,2,NA,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Modified DataFrame")
  
# comouting difference of each group
data_frame%>%group_by(col1)%>%mutate(diff=col3-lag(
  col3,default=first(col3)))

Output

[1] "Original DataFrame" 
  col1 col2 col3 
1    6    a    1 
2    9    b    4 
3    7    c    5 
4    6    a    1 
5    6    b   NA 
6    9    c   NA 
7    6    a    2 
8    8    b   NA 
9    7    c    2 
[1] "Modified DataFrame" 
    # A tibble: 9 x 4 
  # Groups:   col1 [4]    
   col1 col2   col3  diff   
  <int> <chr> <dbl> <dbl> 
1     6 a         1     0 
2     9 b         4     0 
3     7 c         5     0 
4     6 a         1     0 
5     6 b        NA    NA 
6     9 c        NA    NA 
7     6 a         2    NA 
8     8 b        NA    NA
9     7 c         2    -3

Method 2 : Using data.table package

The data frame indexing methods can be used to calculate the difference of rows by group in R. The ‘by’ attribute is to specify the column to group the data by. All the rows are retained, while a new column is added in the set of columns, using the column to take to compute the difference of rows by the group. The difference is calculated by using the particular row of the specified column and subtracting from it the previous value computed using the shift() method. The shift method is used to lag vectors or lists. 



Syntax:

data_frame[ , new-col-name := reqd-col – shift(reqd-col), by = grouping-col]

The first instance of that particular group is replaced by NA in that particular column. 

Example:

R




# installing required libraries
library("data.table")
  
# creating a data frame
data_frame <- data.table(col1 = sample(6:9, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,9,11,2,7,2))
  
print ("Original DataFrame")
print (data_frame)
  
# comouting difference of each group
data_frame[ , diff := col3 - shift(col3), by = col1]
print ("Modified DataFrame")
print (data_frame)

Output

[1] "Original DataFrame" 
col1 col2 col3 
1:    8    a    1 
2:    8    b    4 
3:    7    c    5 
4:    6    a    1 
5:    6    b    9 
6:    8    c   11 
7:    8    a    2 
8:    9    b    7 
9:    7    c    2 
[1] "Modified DataFrame" 
   col1 col2 col3 diff 
1:    8    a    1   NA 
2:    8    b    4    3 
3:    7    c    5   NA 
4:    6    a    1   NA 
5:    6    b    9    8 
6:    8    c   11    7 
7:    8    a    2   -9 
8:    9    b    7   NA 
9:    7    c    2   -3

Method 3 : Using ave() method

The ave() method in base R is used to group averages over the level combinations of factors. 

Syntax:

ave(x, group , FUN = mean)

Parameter : 

  • x – the required data frame column
  • group – the grouping variables
  • FUN – The function to apply for each factor level combination.

The function here is to compute the difference of a particular column in that row and the difference of the previous row with it. The first instance of that particular group is replaced by NA in that particular column. 

Example:

R




# creating a data frame
data_frame <- data.frame(col1 = sample(6:9, 9 , replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,5,1,9,11,2,7,2))
  
print ("Original DataFrame")
print (data_frame)
  
# comouting difference of each group
data_frame$diff <- ave(data_frame$col3, factor(data_frame$col1), 
                       FUN=function(x) c(NA,diff(x)))
                         
print ("Modified DataFrame")
print (data_frame)

Output

[1] "Original DataFrame" 
col1 col2 col3 
1    9    a    1 
2    9    b    4 
3    6    c    5 
4    7    a    1 
5    6    b    9 
6    7    c   11
7    9    a    2 
8    9    b    7 
9    9    c    2
[1] "Modified DataFrame" 
col1 col2 col3 diff 
1    9    a    1   NA 
2    9    b    4    3 
3    6    c    5   NA 
4    7    a    1   NA 
5    6    b    9    4 
6    7    c   11   10 
7    9    a    2   -2 
8    9    b    7    5 
9    9    c    2   -5



My Personal Notes arrow_drop_up
Recommended Articles
Page :