Open In App

Dplyr – Find Mean for multiple columns in R

Last Updated : 14 Sep, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to calculate the mean for multiple columns using dplyr package of R programming language.

Functions in use

  • The mutate() method adds new variables and preserves existing ones. It is used to carry out addition of more variables. The original sequence of rows and columns remain unaltered during this method application.

Syntax: mutate(.data, name-value)

Parameter:

.data – The data frame or table to be appended

name-value – The new column name and a function to define values

  • The rowMeans() returns the mean value of each row in the data set. The function prototype is inclusive of optional parameters including the na.rm logical parameter which is an indicator of whether to omit N/A values.

Syntax:

rowMeans (data-set)
  • The dataset is produced by selecting a particular set of columns to produce mean from. The select() method is used for data frame filtering based on a set of conditions.

Syntax:

select (data-set, cols-to-select)

Thus in order to find the mean for multiple columns of a dataframe using R programming language first we need a dataframe. Then columns from this dataframe can be selected using select() method and the selected columns are passed to rowMeans() function for further processing. The results are added to the dataframe using a separate column using mutate() function.

There can be multiple ways of selecting columns

Example: Calculating mean of multiple columns by selecting columns via vector

R




library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = c(1,2,3,4),
                         col2 = c(2.3,5.6,3.4,1.2),
                         col3 = c(5,6,7,8))
  
print("Original DataFrame")
  
print(data_frame)
  
data_frame_mod <- mutate(data_frame, mean_col = rowMeans(select(data_frame,
                                              c(col2,col3)), na.rm = TRUE))
print("Modified DataFrame")
print(data_frame_mod)


Output:

[1] "Original DataFrame" 
col1 col2 col3 
1    1  2.3    5 
2    2  5.6    6 
3    3  3.4    7 
4    4  1.2    8 
[1] "Modified DataFrame" 
col1 col2 col3 mean_col 
1    1  2.3    5     3.65 
2    2  5.6    6     5.80 
3    3  3.4    7     5.20 
4    4  1.2    8     4.60

The column means can be calculated for all the other columns using the : operator specified in the select() method.

Example: Finding mean for multiple columns by selecting columns via : operator 

R




library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = c(1,2,3,4),
                         col2 = c(2.3,5.6,3.4,1.2),
                         col3 = c(5,6,7,8))
print("Original DataFrame")
  
print(data_frame)
  
data_frame_mod <- mutate(data_frame, mean_col = rowMeans(select(data_frame,
                                              col1:col3), na.rm = TRUE))
  
print("Modified DataFrame")
print(data_frame_mod)


Output

[1] "Original DataFrame" 
  col1 col2 col3 
1    1  2.3    5 
2    2  5.6    6 
3    3  3.4    7 
4    4  1.2    8 
[1] "Modified DataFrame" 
  col1 col2 col3 mean_col 
1    1  2.3    5 2.766667 
2    2  5.6    6 4.533333 
3    3  3.4    7 4.466667 
4    4  1.2    8 4.400000

A specific set of columns can also be extracted from the data frame using methods starts_with() that contains a string. All the columns whose names match with the string are returned in the dataframe.

Example: Finding mean of multiple columns by selecting columns by starts_with()

R




library("dplyr")
  
# creating a data frame
data_frame <- data.frame(col1 = c(1,2,3,4),
                         col2 = c(2.3,5.6,3.4,1.2),
                         nextcol2 = c(1,2,3,0),
                         col3 = c(5,6,7,8),
                         nextcol = c(4,5,6,7)
                         )
print("Original DataFrame")
print(data_frame)
  
print("Modified DataFrame")
  
data_frame %>%
  mutate(mean_col = rowMeans(select(data_frame,
                                    starts_with('next')), na.rm = TRUE))


Output

[1] "Original DataFrame" 
col1 col2 nextcol2 col3 nextcol 
1    1  2.3        1    5       4 
2    2  5.6        2    6       5 
3    3  3.4        3    7       6 
4    4  1.2        0    8       7 
[1] "Modified DataFrame"
      col1 col2 nextcol2 col3 nextcol mean_col 
1    1        2.3        1        5       4         2.5 
2    2       5.6        2       6       5         3.5 
3    3       3.4        3       7       6        4.5 
4    4       1.2        0        8       7       3.5


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads