# How to Calculate the Mean by Group in R DataFrame ?

Last Updated : 25 Sep, 2023

Calculating the mean by group in an R DataFrame involves splitting the data into subsets based on a specific grouping variable and then computing the mean of a numeric variable within each subgroup.

In this article, we will see how to calculate the mean by the group in R DataFrame in R Programming Language.

It can be done with two approaches:

Dataset creation: First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.

## R

 `# GFG dataset name and creation` `GFG <- ``data.frame``(                                            ` `   ``Category  = ``c ``(``"A"``,``"B"``,``"C"``,``"B"``,``"C"``,``"A"``,``"C"``,``"A"``,``"B"``),       ` `   ``Frequency= ``c``(9,5,0,2,7,8,1,3,7)                            ` `)`   `# Prints the dataset` `print``(GFG)                                                    `

Output:

`  Category Frequency1        A         92        B         53        C         04        B         25        C         76        A         87        C         18        A         39        B         7`

So, as you can see the above code is for creating a dataset named “GFG”.

It has 2 columns named Category and Frequency. So, when you run the above code in an R compiler.

Before we discuss those approaches let us first know how we got the output values:

• In Table 1, We have two columns named Category and Frequency.
• In Category, we have some repeating variables of A, B, and C.
• A group values (9,8,3), B group values (5,2,7), and C group values (0,7,1) are taken from the Frequency column.
• So, to find the Mean we have a  formula

MEAN = Sum of terms / Number of terms

• Hence, the Mean by Group  of each group (A, B, C) would be

Sum:

• A=9+8+3=20
• B=5+2+7=14
• C=0+7+1=8

A number of terms:

• A is repeated 3 times
• B is repeated 3 times
• C is repeated 3 times

Mean by group (A, B, C):

• A(mean) = Sum/Number of terms = 20/3 = 6.67
• B(mean) = Sum/Number of terms = 14/3 = 4.67
• C(mean) = Sum/Number of terms = 8/3 = 2.67

### Method 1: Using aggregate function

Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

Syntax: aggregate(x = dataset_Name , by = group_list, FUN = any_function)

# Basic R syntax of aggregate function

Now, let’s sum our data using an aggregate function:

## R

 `# Specify data column` `group_mean<- ``aggregate``(x= GFG\$Frequency,` `                      ``# Specify group indicator` `                      ``by = ``list``(GFG\$Category),      ` `                      ``# Specify function (i.e. mean)` `                      ``FUN = mean)` `print``(group_mean)`

Output:

`  Group.1        x1       A 6.6666672       B 4.6666673       C 2.666667`

In the above aggregate function, it takes on three parameters

• First is the dataset name in our case it is “GFG”.
• Second is the column name which values we need to make different groups in our case it is a Category column, and it is separated into three groups (A, B, C).
• In the third parameter, we need to mention which function(i.e mean, sum, etc) we need to perform on a group formed (A, B, C)

### Method 2: Using dplyr Package

dplyr is a package that provides a set of tools for efficiently manipulating datasets in R

Methods in dplyr package:

• mutate() adds new variables that are functions of existing variables
• select() picks variables based on their names.
• filter() picks cases based on their values.
• summarise() reduces multiple values to a single summary.
• arrange() changes the ordering of the rows.

Install this library:

`install.packages("dplyr") `

`library("dplyr")  `

## R

 `# load dplyr library` `library``(``"dplyr"``)                             `   `# Specify data frame` `group_mean <- GFG %>%` `    ``# Specify group indicator, column, function` `    ``group_by``(Category) %>%` `    ``# Calculate the mean of the "Frequency" column for each group` `    ``summarise_at``(``vars``(Frequency),` `                 ``list``(Mean_Frequency = mean))`     `# Print the resulting summary data frame` `print``(group_mean)`

Output:

`# A tibble: 3 Ã— 2  Category Mean_Frequency  <chr>             <dbl>1 A                  6.672 B                  4.673 C                  2.67`

#### Code Steps:

• The %>% operator allows us to perform the operations one after another.
• group_by(Category) groups the data by the “Category” column. This means that subsequent operations will be performed separately for each unique value in the “Category” column.
• summarise_at() has two parameters first is a column on which it applies the operation given as the second parameter of it.
• The result is a new data frame called group_mean, which contains one row for each unique category and a column “Mean_Frequency” that holds the calculated means.

Finally, group_mean is printed to the console to display the summary statistics for each category.

### Method 3: Use the data.table package

The `data.table` package provides a concise and efficient way to calculate summary statistics by group. In this case, we calculate the mean of the “Frequency” column for each group defined by the “Category” column.

## R

 `# Load the data.table library` `library``(data.table)`   `# Convert data.frame to data.table` `gfg <- ``data.table``(GFG)`   `# Calculate the mean by "Category" group` `mean_by_category <- gfg[, ``.``(Mean_Frequency = ``mean``(Frequency)), by = Category]`   `# Print the result` `print``(mean_by_category)`

Output:

`   Category Mean_Frequency1:        A       6.6666672:        B       4.6666673:        C       2.666667`

#### Code Steps:

• The first line loads the data.table library in R. The data.table package is used for efficient data manipulation.
• Then we convert the existing data frame GFG into a data.table named gfg
• Mean by the “Category” group using the data.table is calculated as follows:
• Inside the gfg data table, we perform the mean of Frequency column group wise, The Mean_Frequency stores the group wise mean of Frequency column.
• The `by` argument specifies the grouping variable. It tells R to group the data by the “Category” column before applying the calculation.