Open In App

How to Calculate the Mean by Group in R DataFrame ?

Last Updated : 25 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Calculating the mean by group in an R DataFrame involves splitting the data into subsets based on a specific grouping variable and then computing the mean of a numeric variable within each subgroup.

In this article, we will see how to calculate the mean by the group in R DataFrame in R Programming Language.

It can be done with two approaches:

Dataset creation: First, we create a dataset so that later we can apply the above two approaches and find the Mean by group.

R




# GFG dataset name and creation
GFG <- data.frame(                                           
   Category  = c ("A","B","C","B","C","A","C","A","B"),      
   Frequency= c(9,5,0,2,7,8,1,3,7)                           
)
 
# Prints the dataset
print(GFG)                                                   


Output:

  Category Frequency
1 A 9
2 B 5
3 C 0
4 B 2
5 C 7
6 A 8
7 C 1
8 A 3
9 B 7


So, as you can see the above code is for creating a dataset named “GFG”.

It has 2 columns named Category and Frequency. So, when you run the above code in an R compiler.

Before we discuss those approaches let us first know how we got the output values:

  • In Table 1, We have two columns named Category and Frequency.
  • In Category, we have some repeating variables of A, B, and C.
  • A group values (9,8,3), B group values (5,2,7), and C group values (0,7,1) are taken from the Frequency column.
  • So, to find the Mean we have a  formula

MEAN = Sum of terms / Number of terms

  • Hence, the Mean by Group  of each group (A, B, C) would be

Sum:

  • A=9+8+3=20
  • B=5+2+7=14
  • C=0+7+1=8

A number of terms:

  • A is repeated 3 times
  • B is repeated 3 times
  • C is repeated 3 times

Mean by group (A, B, C):

  • A(mean) = Sum/Number of terms = 20/3 = 6.67
  • B(mean) = Sum/Number of terms = 14/3 = 4.67
  • C(mean) = Sum/Number of terms = 8/3 = 2.67

Code Implementations

Method 1: Using aggregate function

Aggregate function: Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.

Syntax: aggregate(x = dataset_Name , by = group_list, FUN = any_function) 

# Basic R syntax of aggregate function

Now, let’s sum our data using an aggregate function:

R




# Specify data column
group_mean<- aggregate(x= GFG$Frequency,
                      # Specify group indicator
                      by = list(GFG$Category),     
                      # Specify function (i.e. mean)
                      FUN = mean)
print(group_mean)


Output:

  Group.1        x
1 A 6.666667
2 B 4.666667
3 C 2.666667

In the above aggregate function, it takes on three parameters 

  • First is the dataset name in our case it is “GFG”.
  • Second is the column name which values we need to make different groups in our case it is a Category column, and it is separated into three groups (A, B, C). 
  • In the third parameter, we need to mention which function(i.e mean, sum, etc) we need to perform on a group formed (A, B, C) 

Method 2: Using dplyr Package

dplyr is a package that provides a set of tools for efficiently manipulating datasets in R

Methods in dplyr package:

  • mutate() adds new variables that are functions of existing variables
  • select() picks variables based on their names.
  • filter() picks cases based on their values.
  • summarise() reduces multiple values to a single summary.
  • arrange() changes the ordering of the rows.

Install this library:

install.packages("dplyr") 

Load this library:

library("dplyr")  

R




# load dplyr library
library("dplyr")                            
 
# Specify data frame
group_mean <- GFG %>%
    # Specify group indicator, column, function
    group_by(Category) %>%
    # Calculate the mean of the "Frequency" column for each group
    summarise_at(vars(Frequency),
                 list(Mean_Frequency = mean))
 
 
# Print the resulting summary data frame
print(group_mean)


Output:

# A tibble: 3 × 2
Category Mean_Frequency
<chr> <dbl>
1 A 6.67
2 B 4.67
3 C 2.67

Code Steps:

  • The %>% operator allows us to perform the operations one after another.
  • group_by(Category) groups the data by the “Category” column. This means that subsequent operations will be performed separately for each unique value in the “Category” column.
  • summarise_at() has two parameters first is a column on which it applies the operation given as the second parameter of it.
  • The result is a new data frame called group_mean, which contains one row for each unique category and a column “Mean_Frequency” that holds the calculated means.

Finally, group_mean is printed to the console to display the summary statistics for each category.

Method 3: Use the data.table package

The data.table package provides a concise and efficient way to calculate summary statistics by group. In this case, we calculate the mean of the “Frequency” column for each group defined by the “Category” column.

R




# Load the data.table library
library(data.table)
 
# Convert data.frame to data.table
gfg <- data.table(GFG)
 
# Calculate the mean by "Category" group
mean_by_category <- gfg[, .(Mean_Frequency = mean(Frequency)), by = Category]
 
# Print the result
print(mean_by_category)


Output:

   Category Mean_Frequency
1: A 6.666667
2: B 4.666667
3: C 2.666667

Code Steps:

  • The first line loads the data.table library in R. The data.table package is used for efficient data manipulation.
  • Then we convert the existing data frame GFG into a data.table named gfg
  • Mean by the “Category” group using the data.table is calculated as follows:
    • Inside the gfg data table, we perform the mean of Frequency column group wise, The Mean_Frequency stores the group wise mean of Frequency column.
    • The `by` argument specifies the grouping variable. It tells R to group the data by the “Category” column before applying the calculation.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads