# How to Calculate the Mean by Group in R DataFrame ?

In this article, we are going to see how to calculate the mean by the group in DataFrame in R Programming Language.

**It can be done with two approaches:**

- Using aggregate function
- Using dplyr Package

**Dataset creation: **First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.

## R

`# GFG dataset name and creation` `GFG <- ` `data.frame` `( ` ` ` `Category = ` `c ` `(` `"A"` `,` `"B"` `,` `"C"` `,` `"B"` `,` `"C"` `,` `"A"` `,` `"C"` `,` `"A"` `,` `"B"` `), ` ` ` `Frequency= ` `c` `(9,5,0,2,7,8,1,3,7) ` `)` ` ` `# Prints the dataset` `print` `(GFG) ` |

So, as you can see the above code is for creating a dataset named **“GFG”.**

It also has 2 columns named **Category** and **Frequency.** So, when you run the above code in an R compiler, a table is shown as output as given below

**And after applying that two approaches we need to get output as:**

**Before we discuss those approaches let us first know how we got the output values:**

- In Table 1, We have two columns named Category and Frequency.
- In Category, we have some repeating variables of
**A, B and C.** **A group values (9,8,3)**,**B group values (5,2,7)**and**C group values (0,7,1)**taken from the**Frequency**column.- So, to find
**Mean**we have a formula

MEAN = Sum of terms / Number of terms

- Hence,
**Mean by Group**of each group (A,B,C) would be

**Sum:**

- A=9+8+3=20
- B=5+2+7=14
- C=0+7+1=08

**Number of terms:**

- A is repeated 3 times
- B is repeated 3 times
- C is repeated 3 times

**Mean by group (A, B, C):**

- A(mean) = Sum/Number of terms = 20/3 = 6.67
- B(mean) = Sum/Number of terms = 14/3 = 4.67
- C(mean) = Sum/Number of terms = 8/3 = 2.67

**Method 1: **Using aggregate function

Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

Syntax:aggregate(x = dataset_Name , by = group_list, FUN = any_function)# Basic R syntax of aggregate function

Now, let’s sum our data using an aggregate function:

## R

`GFG <- ` `data.frame` `(` ` ` `Category = ` `c ` `(` `"A"` `,` `"B"` `,` `"C"` `,` `"B"` `,` `"C"` `,` `"A"` `,` `"C"` `,` `"A"` `,` `"B"` `), ` ` ` `Frequency= ` `c` `(9,5,0,2,7,8,1,3,7)` `)` ` ` `# Specify data column` `aggregate` `(x= GFG$Frequency, ` ` ` ` ` `# Specify group indicator` ` ` `by = ` `list` `(GFG$Category), ` ` ` ` ` `# Specify function (i.e. mean)` ` ` `FUN = mean)` |

**Output:**

In the above aggregate function, it takes on three parameters

- First is dataset name in our case it is
**“GFG”.** - Second is the column name which values we need to make different groups in our case it is
**Category column, and**it is separated into three groups**(A,****B, C).** - In the third parameter, we need to mention which function(i.e
**mean, sum, etc)**we need to perform on a group formed**(A, B, C)**

**Method 2: **Using dplyr Package

dplyr is a package which provides a set of tools for efficiently manipulating datasets in R

**Methods in dplyr package:**

**mutate()**adds new variables that are functions of existing variables**select()**picks variables based on their names.**filter()**picks cases based on their values.**summarise()**reduces multiple values down to a single summary.**arrange()**changes the ordering of the rows.

**Install this library:**

install.packages("dplyr")

**Load this library:**

library("dplyr")

**Code:**

## R

`# load dplyr library` `library` `(` `"dplyr"` `) ` `GFG <- ` `data.frame` `(` ` ` `Category = ` `c ` `(` `"A"` `,` `"B"` `,` `"C"` `,` `"B"` `,` `"C"` `,` `"A"` `,` `"C"` `,` `"A"` `,` `"B"` `), ` ` ` `Frequency= ` `c` `(9,5,0,2,7,8,1,3,7)` `)` ` ` `# Specify data frame` `GFG%>% ` ` ` `# Specify group indicator, column, function` `group_by` `(Category) %>% ` `summarise_at` `(` `vars` `(Frequency),` ` ` `list` `(name = mean))` |

**Output:**

In the above code, we first take our dataset named **“GFG”**. With **group_by()** method we form groups in our case (A, B, C). **summarise_at()** it has two parameters first is a column on which it applies the operation given as the second parameter of it.