Open In App

How To Calculate Cumulative Sum By Group In R

Last Updated : 21 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

The sum of a collection of numbers as the sum value increases with the number sequence is known as the cumulative sum. In data analysis tasks, it is essential to calculate cumulative sums within groups. This operation helps when we deal with time series or categorical data. In this article, we will learn how to calculate Cumulative Sum by Group in R Programming Language.

In R, there are several methods to calculate the cumulative sum by group.

  • Method 1: Using Base R
  • Method 2: Using dplyr
  • Method 3: Using data.table

Creating a DataFrame for Calculate Cumulative Sum By Group In R

Create a data frame by using data.frame() function.

  • group_var: This column represents the grouping variable. It contains the letters ‘A’, ‘B’, and ‘C’, each repeated 4 times. This indicates that there are 3 groups in total: A, B, and C, with each group having 4 observations.
  • values_var: This column represents the values associated with each group. It contains numeric values corresponding to each group. The values vary across groups and observations.

R




# Sample data frame
df <- data.frame(
  group_var = rep(c('A', 'B', 'C'), each = 4),
  values_var = c(3, 4, 4, 2, 5, 8, 9, 7, 6, 8, 3, 2)
)
print(df)


Output:

   group_var values_var
1 A 3
2 A 4
3 A 4
4 A 2
5 B 5
6 B 8
7 B 9
8 B 7
9 C 6
10 C 8
11 C 3
12 C 2

Using Base R for Calculate Cumulative Sum By Group In R

Base R provides a straightforward method for calculating cumulative sums by group using the ave() function.

We calculate the cumulative sum using the ‘ave()’ function.

Syntax: ave(values, by = group, FUN = cumsum)

values:

This parameter represents the values for which you want to calculate the cumulative sum. It could be a numeric vector or a column of a dataframe.

by:

This parameter specifies the grouping variable. It could be a factor, a list, or a vector. The function will calculate the cumulative sum within each group defined by this parameter.

FUN:

This parameter specifies the function to be applied to each group. In this case, cumsum is specified, which stands for cumulative sum.

You can also use other functions like mean, median, etc., based on your requirement.

Here, we use the ave() function to compute the average (or other summary function) of the data values in values broken down by levels of the factor. When FUN is specified as cumsum, it calculates the cumulative sum within each group defined by the by parameter.

R




df$cum_sum_baseR <- ave(df$values_var, df$group_var, FUN = cumsum)
print(df)


Output:

   group_var values_var cum_sum_baseR
1 A 3 3
2 A 4 7
3 A 4 11
4 A 2 13
5 B 5 5
6 B 8 13
7 B 9 22
8 B 7 29
9 C 6 6
10 C 8 14
11 C 3 17
12 C 2 19

Using dplyr To Calculate Cumulative Sum By Group In R

To calculate the cumulative sum by the group in R, another method is ‘dplyr’.

The dplyr package in R Programming Language is a structure of data manipulation that provides a uniform set of verbs, helping to resolve the most frequent data manipulation.

R




# Using Method 2 (dplyr)
library(dplyr)
df <- df %>%
  group_by(group_var) %>%
  mutate(cum_sum_dplyr = cumsum(values_var))
print(df)


Output:

A tibble: 12 × 4
Groups: group_var [3]
group_var values_var cum_sum_baseR cum_sum_dplyr
<chr> <dbl> <dbl> <dbl>
1 A 3 3 3
2 A 4 7 7
3 A 4 11 11
4 A 2 13 13
5 B 5 5 5
6 B 8 13 13
7 B 9 22 22
8 B 7 29 29
9 C 6 6 6
10 C 8 14 14
11 C 3 17 17
12 C 2 19 19

Using data.table To Calculate Cumulative Sum By Group In R

data.table in R is an enhanced version of the data.frame. Due to its execution speed and the less code to type it became popular in R. The purpose of the data.table is to create tabular data same as a data frame but the syntax varies.

  • setDT() converts the data frame df to a data table, and := is used to create a new column cum_sum with the cumulative sum calculated using cumsum(), grouped by group_var.

R




# Using Method 3 (data.table)
library(data.table)
setDT(df)[, cum_sum_dataTable := cumsum(values_var), by = group_var]
print(df)


Output:

    group_var values_var cum_sum_baseR cum_sum_dplyr cum_sum_dataTable
1: A 3 3 3 3
2: A 4 7 7 7
3: A 4 11 11 11
4: A 2 13 13 13
5: B 5 5 5 5
6: B 8 13 13 13
7: B 9 22 22 22
8: B 7 29 29 29
9: C 6 6 6 6
10: C 8 14 14 14
11: C 3 17 17 17
12: C 2 19 19 19

Conclusion

In conclusion, we explored various methods for calculating cumulative sums by group in R: using base R, dplyr, and data.table. These methods provide efficient ways to analyze and manipulate grouped data, enhancing the capabilities of R in data analysis tasks.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads