Open In App

Apply a function to each group using Dplyr in R

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to learn how to apply a function to each group using dplyr in the R programming language.

The dplyr package in R is used for data manipulations and modifications. The package can be downloaded and installed into the working space using the following command : 

install.packages("dplyr")

What is tibble?

A tibble is a data frame-like structure in R. It contains rows and columns arranged in a tabular structure. It illustrates the data type of the data frame’s column. It can be created in R using the following dimensions : 

tibble( 
col-1-name = values,
col-2-name = values)

group_by()

To apply a function to every group in the data, we need to first group the data according to the classes available. The group_by() method in the dplyr package divides the data into different segments. It has the following syntax : 

Syntax: group_by(col1, col2..)

Arguments : 

  • col1, col2,.. – The columns to group the data by

Method 1: Using mutate method

The mutate() method in R is then applied using the pipe operator to create new columns in the provided data. The mutate() method is used to calculate the aggregated function provided. 

Syntax: mutate(new-col-name = func)

Arguments : 

  • new-col-name – The new column to be added to the tibble
  • func – The function to be applied on the specified data frame. 

The following code snippet illustrates the procedure where the data in the tibble is divided into groups based on the values in col x. The sum of the y col values is then calculated and returned group-wise. 

In this example, the row numbers 1, 5, and 9 correspond to the value 2 in the ‘x’ column. The values in the ‘y’ column are 1,5 and 9, respectively. The sum of these values is 15, which is returned in the output tibble. Every row corresponding to 2 in the col x, displays 15 in the sum_group_y col. 

R




# Importing dplyr package
library(dplyr)
  
# Creating a tibble in R
data = tibble(
  x = c(2,4,5,6,2,5,6,6,2,6), 
  y = 1:10)
print("Data")
print(data)
  
# Grouping the data by x and 
# then computing the group wise 
# sum using y column
data %>% group_by(x) %>% 
 mutate(sum_group_y = sum(y))


Output:

# A tibble: 10 x 2
       x     y
   <dbl> <int>
 1     2     1
 2     4     2
 3     5     3
 4     6     4
 5     2     5
 6     5     6
 7     6     7
 8     6     8
 9     2     9
10     6    10
# A tibble: 10 x 3
# Groups:   x [4]
       x     y sum_group_y
   <dbl> <int>       <int>
 1     2     1          15
 2     4     2           2
 3     5     3           9
 4     6     4          29
 5     2     5          15
 6     5     6           9
 7     6     7          29
 8     6     8          29
 9     2     9          15
10     6    10          29

Method 2: Using the group_map method

The group_map() method can also apply a function to each group in the tibble. The method returns the number of tibbles equivalent to the number of groups returned. It has the following syntax : 

Syntax: group_map(.data, .f, …, .keep = FALSE)

Arguments : 

  • .data – A grouped tibble
  • .f – The function to be applied

The following code snippet illustrates the usage of the group_map() method on a grouped tibble, wherein user-defined entries are grouped based on column x values. The sum of column y values is computed against each tibble group value. A user-defined sum_y function is declared and defined, which returns the output as the sum of the input vector values. 

R




# Importing dplyr
library(dplyr)
  
# Creating a tibble
data = tibble(
  x = c(2,4,5,6,2,5,6,6,2,6), 
  y = 1:10)
print("Data")
print(data)
  
sum_y = function(vector) {
  return(tibble::tibble(sum = sum(vector)))
}
  
# Grouping the data by x and 
# then computing the group wise
# sum using y column
data %>%
  group_by(x) %>%
  group_map(~sum_y(.x$y))


Output:

[1] "Data"
> print(data)
# A tibble: 10 × 2
       x     y
   <dbl> <int>
 1     2     1
 2     4     2
 3     5     3
 4     6     4
 5     2     5
 6     5     6
 7     6     7
 8     6     8
 9     2     9
10     6    10

[[1]]
# A tibble: 1 x 1
    sum
  <int>
1    15

[[2]]
# A tibble: 1 x 1
    sum
  <int>
1     2

[[3]]
# A tibble: 1 x 1
    sum
  <int>
1     9

[[4]]
# A tibble: 1 x 1
    sum
  <int>
1    29


Last Updated : 16 Sep, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads