Summarise multiple columns using dplyr in R

Last Updated : 24 Oct, 2021

In this article, we will discuss how to summarise multiple columns using dplyr package in R Programming Language,

Method 1: Using summarise_all() method

The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column.

summarise_all(data, function)

Arguments :

data – The data frame to summarise the columns of
function – The function to apply on all the data frame columns.

R

library("dplyr") 
  
# creating a data frame 
df < - data.frame(col1=sample(rep(c(1: 5), each=3)), 
                  col2=5: 19) 
  
print("original dataframe") 
print(df) 
  
# summarising the data 
print("summarised dataframe") 
summarise_all(df, mean) 

Output

[1] "original dataframe"
col1 col2 
1     2    1 
2     3    2 
3     4    3 
4     2    4 
5     2    5 
6     4    6 
7     1    7 
8     1    8 
9     5    9 
10    3   10 
11    5   11 
12    1   12 
13    4   13 
14    5   14 
15    3   15    
col1 col2
 1    3    8

Explanation: The mean of all the values is calculated column-wise, that is, the sum of values of col1 is calculated and divided by the number of rows. Similarly, the summation of values is computed for col2 and col3. All the columns are returned in the final output.

Method 2: Using summarise_at() method

The summarise_at() affects variables that are extracted with a character vector or vars(). It applies the selected function to the data frame. The output data frame contains all the columns that are specified in the summarise_at method. In case all the columns of the data frame are mentioned, then the functionality of this method is the same as the summarise_all method.

data %>%
 summarise_at(vars(-cols(), ...), function)

Arguments :

data – The data frame to summarise the columns of
function – The function to apply on all the data frame columns.

R

library("dplyr") 
  
# creating a data frame 
df < - data.frame(col1=sample(rep(c(1: 5), each=3)), 
                  col2=1: 15, 
                  col3=letters[1:15]) 
  
print("original dataframe") 
print(df) 
  
# summarising the data 
print("summarised dataframe") 
df % >% 
summarise_at(c("col1", "col2"), mean, na.rm=TRUE) 

Output

[1] "original dataframe" 
col1 col2 col3 
1     3    1    a 
2     5    2    b 
3     4    3    c 
4     4    4    d 
5     5    5    e 
6     3    6    f 
7     2    7    g 
8     2    8    h 
9     1    9    i 
10    4   10    j 
11    2   11    k 
12    5   12    l 
13    1   13    m 
14    3   14    n 
15    1   15    o 
[1] "summarised dataframe" 
   col1 col2 
1    3    8

Suggest improvement

Print Entire tibble to R Console

Save and Load RData Workspace Files in R

Share your thoughts in the comments

Summarise multiple columns using dplyr in R

Method 1: Using summarise_all() method

R

Method 2: Using summarise_at() method

R

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?