Skip to content
Related Articles

Related Articles

Improve Article

How to find the difference in value in every two consecutive rows in R DataFrame ?

  • Last Updated : 30 May, 2021

In this article, we will discuss how to find the difference in value in every two consecutive rows in DataFrame in R Programming Language.

Method 1 : Using diff() method

diff() method in base R is used to find the difference among all the pairs of consecutive rows in the R dataframe. It returns a vector with the length equivalent to the length of the input column – 1.  The elements of the input column are evaluated from the last element, where each element is replaced by the element at nth index – element at (n-1)th index.  No output is returned for the first element, since it doesn’t have any element to induce lag with reference to. This method is applicable for integer or numeric data columns itself. 

Syntax:

diff(vec , lag = 1 )

Parameter : 



vec – Vector to compute the differences of

lag – (Default : 1 )The nth previous value to compute the differences with

Example:

R




# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
                         col2 = letters[1:3],
                         col3 = c(1,4,1,2,3,2,1,2,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Difference in col3 successive values")
diff(data_frame$col3)

Output

[1] "Original DataFrame"
 col1 col2 col3
1    1    a    1
2    1    b    4
3    1    c    1
4    2    a    2
5    2    b    3
6    2    c    2
7    3    a    1
8    3    b    2
9    3    c    2
[1] "Difference in col3 successive values"
[1]  3 -3  1  1 -1 -1  1  0

 Method 2: Using dplyr package

The “dply” package in R programming language can be used to carry out data modifications or enhancements. It provides a large variety of functions to produce data manipulation and extraction operations. 

The mutate() method is used for the creation, deletion, and updating of the columns of the dataframe. It takes as an argument the new column name and the corresponding function to apply upon it. 

Syntax:



mutate ( new-col-name = col-name – lag(col-name))

The lag() method of the dplyr package is used to return the previous value of the specified column. It returns NA if there is no preceding row for that column. The customized column name can be assigned to the difference column. This method is different from others as it returns a superset of the original dataframe as an output.

Example:

R




library("dplyr")
  
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
                         col2 = letters[1:3],
                         col3 = c(1,4,1,2,2,2,1,2,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Modified DataFrame")
# difference in rows of col3
data_frame %>%  mutate(col3_diff = col3 - lag(col3))

Output

[1] "Original DataFrame" 
  col1 col2 col3 
1    1    a    1 
2    1    b    4 
3    1    c    1 
4    2    a    2 
5    2    b    2 
6    2    c    2 
7    3    a    1 
8    3    b    2 
9    3    c    2 
[1] "Modified DataFrame" 
  col1 col2 col3 col3_diff 
1    1    a    1        NA 
2    1    b    4         3 
3    1    c    1        -3 
4    2    a    2         1 
5    2    b    2         0 
6    2    c    2         0 
7    3    a    1        -1 
8    3    b    2         1 
9    3    c    2         0

Method 3 : Using nrow() method

All the columns can be calculated to find the difference of the values in every pair of consecutive rows of the dataframe. The dataframe is accessed from the last row with every row one place before it. And, the value is obtained by the subtraction of the row at nth index with row at (n-1)th index. In case, the class of the dataframe column is a character, a missing value is returned. 

The first row is deleted from the output dataframe. The row numbers beginning with row number 2 are returned as an output dataframe. 

Example:

R




# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
                         col2 = letters[1:3],
                         col3 = c(1,4,1,2,2,2,1,2,2))
  
print ("Original DataFrame")
print (data_frame)
  
# calculating rows of dataframe
rows <- nrow(data_frame)
  
# difference in rows of entire dataframe
diff_frame <- data_frame[-1,] - data_frame[-rows,]
  
print ("Modified DataFrame")
print(diff_frame)

Output

[1] "Original DataFrame"
 col1 col2 col3
1    1    a    1
2    1    b    4
3    1    c    1
4    2    a    2
5    2    b    2
6    2    c    2
7    3    a    1
8    3    b    2
9    3    c    2
[1] "Modified DataFrame"
 col1 col2 col3
 2    0   NA    3
 3    0   NA   -3
 4    1   NA    1
 5    0   NA    0
 6    0   NA    0
 7    1   NA   -1
 8    0   NA    1
 9    0   NA    0



My Personal Notes arrow_drop_up
Recommended Articles
Page :