Open In App

Dplyr – Find Mean for multiple columns in R

In this article, we will discuss how to calculate the mean for multiple columns using dplyr package of R programming language.

Functions in use

Syntax: mutate(.data, name-value)



Parameter:

.data – The data frame or table to be appended



name-value – The new column name and a function to define values

Syntax:

rowMeans (data-set)

Syntax:

select (data-set, cols-to-select)

Thus in order to find the mean for multiple columns of a dataframe using R programming language first we need a dataframe. Then columns from this dataframe can be selected using select() method and the selected columns are passed to rowMeans() function for further processing. The results are added to the dataframe using a separate column using mutate() function.

There can be multiple ways of selecting columns

Example: Calculating mean of multiple columns by selecting columns via vector




library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = c(1,2,3,4),
                         col2 = c(2.3,5.6,3.4,1.2),
                         col3 = c(5,6,7,8))
  
print("Original DataFrame")
  
print(data_frame)
  
data_frame_mod <- mutate(data_frame, mean_col = rowMeans(select(data_frame,
                                              c(col2,col3)), na.rm = TRUE))
print("Modified DataFrame")
print(data_frame_mod)

Output:

[1] "Original DataFrame" 
col1 col2 col3 
1    1  2.3    5 
2    2  5.6    6 
3    3  3.4    7 
4    4  1.2    8 
[1] "Modified DataFrame" 
col1 col2 col3 mean_col 
1    1  2.3    5     3.65 
2    2  5.6    6     5.80 
3    3  3.4    7     5.20 
4    4  1.2    8     4.60

The column means can be calculated for all the other columns using the : operator specified in the select() method.

Example: Finding mean for multiple columns by selecting columns via : operator 




library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = c(1,2,3,4),
                         col2 = c(2.3,5.6,3.4,1.2),
                         col3 = c(5,6,7,8))
print("Original DataFrame")
  
print(data_frame)
  
data_frame_mod <- mutate(data_frame, mean_col = rowMeans(select(data_frame,
                                              col1:col3), na.rm = TRUE))
  
print("Modified DataFrame")
print(data_frame_mod)

Output

[1] "Original DataFrame" 
  col1 col2 col3 
1    1  2.3    5 
2    2  5.6    6 
3    3  3.4    7 
4    4  1.2    8 
[1] "Modified DataFrame" 
  col1 col2 col3 mean_col 
1    1  2.3    5 2.766667 
2    2  5.6    6 4.533333 
3    3  3.4    7 4.466667 
4    4  1.2    8 4.400000

A specific set of columns can also be extracted from the data frame using methods starts_with() that contains a string. All the columns whose names match with the string are returned in the dataframe.

Example: Finding mean of multiple columns by selecting columns by starts_with()




library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = c(1,2,3,4),
                         col2 = c(2.3,5.6,3.4,1.2),
                         nextcol2 = c(1,2,3,0),
                         col3 = c(5,6,7,8),
                         nextcol = c(4,5,6,7)
                         )
print("Original DataFrame")
print(data_frame)
  
print("Modified DataFrame")
  
data_frame %>%
  mutate(mean_col = rowMeans(select(data_frame,
                                    starts_with('next')), na.rm = TRUE))

Output

[1] "Original DataFrame" 
col1 col2 nextcol2 col3 nextcol 
1    1  2.3        1    5       4 
2    2  5.6        2    6       5 
3    3  3.4        3    7       6 
4    4  1.2        0    8       7 
[1] "Modified DataFrame"
      col1 col2 nextcol2 col3 nextcol mean_col 
1    1        2.3        1        5       4         2.5 
2    2       5.6        2       6       5         3.5 
3    3       3.4        3       7       6        4.5 
4    4       1.2        0        8       7       3.5

Article Tags :