Open In App

Aggregate data using custom functions using R

Last Updated : 12 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will explore various methods to aggregate data using custom functions by using the R Programming Language.

What is a custom function?

Custom functions are an essential part of R programming, which allows users to create reusable blocks of code tailored to their specific needs. These functions encapsulate a series of operations, making code readable, and easier to maintain.

How to aggregate data using custom functions

The aggregate function in R is designed to aggregate data in a data frame. R language offers various methods to aggregate data by using custom functions. By using these methods provided by R, it is possible to aggregate data easily. Some of the methods to aggregate data using custom functions are:

Aggregating data by sum using the custom function

This method is used to aggregate data by sum using the custom function. In the below example, we created a data frame and performed mean by using the custom function .

R
# creating data frame
df <- data.frame(
  date = as.Date(c("2024-01-01", "2024-01-15", "2024-02-10", "2024-02-20", "2024-03-20",
                   "2024-03-15")),
  sold = c(100, 150, 200, 250,300,350)
                     )

print("The original dataframe is")
print(df)

# Custom function to result
result = function(x) {
  return(sum(x))
                           }

print("After calculating the sum is")
sales_permonth <- aggregate(sold ~ format(date, "%Y-%m"),
                            data = df, FUN = result)

print(sales_permonth)

Output:

[1] "The original dataframe is"
date sold
1 2024-01-01 100
2 2024-01-15 150
3 2024-02-10 200
4 2024-02-20 250
5 2024-03-20 300
6 2024-03-15 350

[1] "Aggregating data per month is"
format(date, "%Y-%m") sold
1 2024-01 250
2 2024-02 450
3 2024-03 650

In the below example, we created a data frame and performed sum by using the custom function .

R
goods=c("a","b","c","d","b","c","a")
prices=c(100,200,300,400,500,600,700)
#creating data frame
df = data.frame(goods,prices)
print(df)

print("After calculating the sum is")
res = aggregate(prices ~ goods , data = df, FUN = sum)
print(res)

Output:

    goods   prices
1 a 100
2 b 200
3 c 300
4 d 400
5 b 500
6 c 600
7 a 700

[1] "Aggregating data by sum is"
goods prices
1 a 800
2 b 700
3 c 900
4 d 400

Aggregating data by mean using the custom function

This method is used to aggregate data by mean using the custom function. In the below example, we created a data frame and performed mean by using the custom function .

R
names=c("a","a","b","c","c","b")
scores=c(100,95,90,80,85,70)

# creating data frame
df = data.frame(names,scores)

print("The original dataframe is")
print(df)

# calculating mean
cal_mean = function(x) {
  return(mean(x))
}

print("After calculating the mean is")
result = aggregate(scores ~names, data = df, 
                         FUN = cal_mean)

print(result)

Output:

[1] "The original dataframe is"
names scores
1 a 100
2 a 95
3 b 90
4 c 80
5 c 85
6 b 70

[1] "After calculating the mean is"
names scores
1 a 97.5
2 b 80.0
3 c 82.5

In the below example, we created a data frame and performed mean by using the custom function.

R
team = c("csk", "rcb", "rcb", "srh", "srh","csk",'csk')
run_rate= c(80, 85, 70, 85, 85, 86, 95)
  
# creating data frame
df = data.frame(team, run_rate)

print("The original dataframe is")
print(df)

cal_mean = function(x) {
  return(mean(x))
}

print("After calculating the mean is")
# Aggregating data by group
result <- aggregate(run_rate ~ team, data = df, 
                         FUN = cal_mean)

print(result)

Output:

[1] "The original dataframe is"
team run_rate
1 csk 80
2 rcb 85
3 rcb 70
4 srh 85
5 srh 85
6 csk 86
7 csk 95

[1] "After calculating the mean is"
team run_rate
1 csk 87.0
2 rcb 77.5
3 srh 85.0

Aggregating data by median using the Custom Function

This method is used to aggregate data by median using the custom function. In the below example, we created a data frame and performed median by using the custom function.

R
# Sample data
prices <- data.frame(
  category = c("A", "A","A", "B", "B","B", "C", "C","C"),
  values = c(10, 15, 20, 23, 30, 25, 40, 55, 60)
                                )

print("The original dataframe is")
print(prices)

# calculating median
 cal_median = function(x) {
  return(median(x))
                      }
                      
result = aggregate(values ~ category,
                              data = prices, FUN = cal_median)
print("After calculating the median is")
print(result)

Output:

[1] "The original dataframe is"
category values
1 A 10
2 A 15
3 A 20
4 B 23
5 B 30
6 B 25
7 C 40
8 C 55
9 C 60

[1] "After calculating the median is"
category values
1 A 15
2 B 25
3 C 55

In the below example, we created a data frame and performed median by using the custom function.

R
name=c("a","b","c","b","a","b")
r_no=c(350,355,355,360,365,370)

# creating data frame
product_prices = data.frame(name, r_no )

print("The original dataframe is")
print(product_prices)

# To calculate median
calculate_median = function(x) {
  return(median(x))
                      }

res<- aggregate(r_no~ name, data = product_prices,
                 FUN = calculate_median)

print(res)

Output:

[1] "The original dataframe is"
name r_no
1 a 350
2 b 355
3 c 355
4 b 360
5 a 365
6 b 370

name r_no
1 a 357.5
2 b 360.0
3 c 355.0

Aggregating data by standard deviation using the Custom Function

This method is used to aggregate data by standard deviation using the custom function. In the below example, we created a data frame and performed standard deviation by using the custom function.

R
 batch = c("x", "y", "x", "y", "x","x")
  number = c(20, 35, 20, 34, 25,40)

df <- data.frame(batch, number)
print(df)

cus_sd <- function(x) {
  return(sd(x, na.rm = TRUE))
}

res = aggregate(number ~ batch, data = df, FUN = cus_sd)
print(res)

Output:

  batch number
1 x 20
2 y 35
3 x 20
4 y 34
5 x 25
6 x 40

batch number
1 x 9.4648472
2 y 0.7071068

In the below example, we created a data frame and performed standard deviation by using the custom function.

R
 names = c("raju", "ravi", "rakesh", "raju", "rakesh","ravi")
  cgpa = c(7.5, 8.5, 7.0, 9.5, 8.8, 8.0)

df <- data.frame(names, cgpa)
print(df)

cus_sd <- function(x) {
  return(sd(x, na.rm = TRUE))
}

print("After calculating the standard deviation is")
res = aggregate( cgpa ~ names, data = df, FUN = cus_sd)
print(res)

Output:

    names    cgpa
1 raju 7.5
2 ravi 8.5
3 rakesh 7.0
4 raju 9.5
5 rakesh 8.8
6 ravi 8.0

[1] "After calculating the standard deviation is"
names cgpa
1 raju 1.4142136
2 rakesh 1.2727922
3 ravi 0.3535534

Conclusion

In Conclusion, we learned about how to aggregate data by using the custom functions using R. R language offers versatile tools while handling with custom functions.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads