How to Calculate the Mean by Group in R DataFrame ?
In this article, we are going to see how to calculate the mean by the group in DataFrame in R Programming Language.
It can be done with two approaches:
- Using aggregate function
- Using dplyr Package
Dataset creation: First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.
So, as you can see the above code is for creating a dataset named “GFG”.
It also has 2 columns named Category and Frequency. So, when you run the above code in an R compiler, a table is shown as output as given below
And after applying that two approaches we need to get output as:
Before we discuss those approaches let us first know how we got the output values:
- In Table 1, We have two columns named Category and Frequency.
- In Category, we have some repeating variables of A, B and C.
- A group values (9,8,3), B group values (5,2,7) and C group values (0,7,1) taken from the Frequency column.
- So, to find Mean we have a formula
MEAN = Sum of terms / Number of terms
- Hence, Mean by Group of each group (A,B,C) would be
Number of terms:
- A is repeated 3 times
- B is repeated 3 times
- C is repeated 3 times
Mean by group (A, B, C):
- A(mean) = Sum/Number of terms = 20/3 = 6.67
- B(mean) = Sum/Number of terms = 14/3 = 4.67
- C(mean) = Sum/Number of terms = 8/3 = 2.67
Method 1: Using aggregate function
Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.
Syntax: aggregate(x = dataset_Name , by = group_list, FUN = any_function)
# Basic R syntax of aggregate function
Now, let’s sum our data using an aggregate function:
In the above aggregate function, it takes on three parameters
- First is dataset name in our case it is “GFG”.
- Second is the column name which values we need to make different groups in our case it is Category column, and it is separated into three groups (A, B, C).
- In the third parameter, we need to mention which function(i.e mean, sum, etc) we need to perform on a group formed (A, B, C)
Method 2: Using dplyr Package
dplyr is a package which provides a set of tools for efficiently manipulating datasets in R
Methods in dplyr package:
- mutate() adds new variables that are functions of existing variables
- select() picks variables based on their names.
- filter() picks cases based on their values.
- summarise() reduces multiple values down to a single summary.
- arrange() changes the ordering of the rows.
Install this library:
Load this library:
In the above code, we first take our dataset named “GFG”. With group_by() method we form groups in our case (A, B, C). summarise_at() it has two parameters first is a column on which it applies the operation given as the second parameter of it.