Skip to content
Related Articles

Related Articles

Improve Article

How to Calculate the Mean by Group in R DataFrame ?

  • Last Updated : 01 Apr, 2021

In this article, we are going to see how to calculate the mean by the group in DataFrame in R Programming Language.

It can be done with two approaches:

Dataset creation: First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.

R




# GFG dataset name and creation
GFG <- data.frame(                                            
   Category  = c ("A","B","C","B","C","A","C","A","B"),       
   Frequency= c(9,5,0,2,7,8,1,3,7)                            
)
  
# Prints the dataset
print(GFG)                                                    

So, as you can see the above code is for creating a dataset named “GFG”.



It also has 2 columns named Category and Frequency. So, when you run the above code in an R compiler, a table is shown as output as given below

And after applying that two approaches we need to get output as:

Before we discuss those approaches let us first know how we got the output values:

  • In Table 1, We have two columns named Category and Frequency.
  • In Category, we have some repeating variables of A, B and C.
  • A group values (9,8,3), B group values (5,2,7) and C group values (0,7,1) taken from the Frequency column.
  • So, to find Mean we have a  formula

MEAN = Sum of terms / Number of terms

  • Hence, Mean by Group  of each group (A,B,C) would be

Sum:

  • A=9+8+3=20
  • B=5+2+7=14
  • C=0+7+1=08

Number of terms:



  • A is repeated 3 times
  • B is repeated 3 times
  • C is repeated 3 times

Mean by group (A, B, C):

  • A(mean) = Sum/Number of terms = 20/3 = 6.67
  • B(mean) = Sum/Number of terms = 14/3 = 4.67
  • C(mean) = Sum/Number of terms = 8/3 = 2.67

Method 1: Using aggregate function

Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

Syntax: aggregate(x = dataset_Name , by = group_list, FUN = any_function) 

# Basic R syntax of aggregate function

Now, let’s sum our data using an aggregate function:

R




GFG <- data.frame(
   Category  = c ("A","B","C","B","C","A","C","A","B"), 
   Frequency= c(9,5,0,2,7,8,1,3,7)
)
  
# Specify data column
aggregate(x= GFG$Frequency,     
            
         # Specify group indicator
         by = list(GFG$Category),      
            
         # Specify function (i.e. mean)
         FUN = mean)

Output:

In the above aggregate function, it takes on three parameters 



  • First is dataset name in our case it is “GFG”.
  • Second is the column name which values we need to make different groups in our case it is Category column, and it is separated into three groups (A, B, C). 
  • In the third parameter, we need to mention which function(i.e mean, sum, etc) we need to perform on a group formed (A, B, C) 

Method 2: Using dplyr Package

dplyr is a package which provides a set of tools for efficiently manipulating datasets in R

Methods in dplyr package:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values down to a single summary.
  • arrange() changes the ordering of the rows.

Install this library:

install.packages("dplyr") 

Load this library:

library("dplyr")  

Code:

R




# load dplyr library
library("dplyr")                             
GFG <- data.frame(
   Category  = c ("A","B","C","B","C","A","C","A","B"), 
   Frequency= c(9,5,0,2,7,8,1,3,7)
)
  
# Specify data frame
GFG%>%                                        
  
# Specify group indicator, column, function
group_by(Category) %>%                        
summarise_at(vars(Frequency),
              list(name = mean))

Output:

In the above code, we first take our dataset named “GFG”. With group_by() method we form groups in our case (A, B, C). summarise_at() it has two parameters first is a column on which it applies the operation given as the second parameter of it.




My Personal Notes arrow_drop_up
Recommended Articles
Page :