Open In App

How to Use aggregate and Not Drop Rows with NA in R

Last Updated : 01 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In R Programming Language the aggregate() function is used to compute summary statistics by group. By default, aggregate() drop any rows with missing values (NA) in the grouping columns. However, we can specify the argument na.action = na.pass to retain rows with NA values during aggregation.

Let us study in detail about how to use aggregate & Not Drop Rows with NA in R

Syntax:

aggregate(formula, data, FUN, na.action = na.pass)

Where:

  • formula: A formula specifying the variables to be aggregated and the grouping variable(s).
  • data: The data frame containing the variables.
  • FUN: The function to be applied for aggregation (e.g., mean, sum, max, etc.).
  • na.action: Specifies how to handle NA values. Setting na.action = na.pass retains rows with NA values during aggregation.

Aggregating with Sum

In this example, we have a dataset containing two columns: “Group” and “Value” and we will aggregate the sum of “Value” by “Group”, and retain rows with NA values during aggregation.

R




# Create  dataframe
df1 <- data.frame(Group = c("A", "B", "A", "B", NA),
                  Value = c(NA, 2, NA, 4, 5))
 
# Aggregate with sum and retain rows with NA values
result1 <- aggregate(Value ~ Group, data = df1, FUN = sum, na.action = na.pass)
 
# Display the result
print(result1)


Output:

  Group Value
1 A NA
2 B 6

Aggregating with Custom Function

In this example, we want to find the median of “Rating” within each “Group” in a dataset df with two columns: “Group” and “Rating”.Here we apply a custom function to compute the median of “Rating” within each “Group”, ensuring that rows with NA values are not dropped during aggregation.

R




#Program in R to use the aggregate() function in R while retaining rows
 
# Create  dataframe
df4 <- data.frame(Group = c("A", "B", "A", "B", NA),
                  Rating = c(3.5, 4.2, NA, 3.8, 4.5))
 
# Custom function to compute median
median_custom <- function(x) {
  median(x, na.rm = TRUE)
}
 
# Aggregate with custom function and retain rows with NA values
result4 <- aggregate(Rating ~ Group, data = df4, FUN = median_custom,
                     na.action = na.pass)
 
# Display the result
print(result4)


Output:

  Group Rating
1 A 3.5
2 B 4.0

Aggregating with Count

In this example we want to count the number of purchases made by each customer, ensuring that rows with NA values are retained during aggregation.

R




# Create data frame
customer_data <- data.frame(
  Customer = c('Jayesh', 'Anurag', 'Vipul', 'Shivang', 'Pratham'),
  Purchases = c(5, 8, NA, 12, NA),
  Returns = c(NA, 2, 1, NA, 3)
)
 
# Count number of purchases made by each customer, retaining NA rows
aggregate(. ~ Customer, data = customer_data, FUN = function(x) sum(!is.na(x)),
          na.action = na.pass)


Output:

  Customer Purchases Returns
1 Anurag 1 1
2 Jayesh 1 0
3 Pratham 0 1
4 Shivang 1 0
5 Vipul 0 1

Aggregating with Mean

In this example, we calculate the mean score for each student in the subjects while ensuring that rows with NA values are retained during aggregation. The na.action = na.pass argument allows us to include NA values in the calculation of the mean score for each subject.

R




# Create data frame
student_scores <- data.frame(
  Student = c('Jayesh', 'Anurag', 'Vipul', 'Shivang', 'Pratham'),
  Math = c(80, NA, 75, 90, 85),
  Science = c(NA, 70, 85, 88, 92),
  English = c(78, 85, 82, NA, 90)
)
 
# Calculate mean score for each subject, retaining NA rows
aggregate(. ~ Student, data = student_scores, FUN = mean, na.action = na.pass)


Output:

  Student Math Science English
1 Anurag NA 70 85
2 Jayesh 80 NA 78
3 Pratham 85 92 90
4 Shivang 90 88 NA
5 Vipul 75 85 82

Conclusion

In this article we understood that the aggregate() function is a powerful tool for computing summary statistics by group. By default, aggregate() drops any rows containing missing values (NA) in the grouping columns, which may lead to inaccurate analyses. However, by specifying na.action = na.pass, we can retain rows with NA values during aggregation, ensuring a more comprehensive analysis.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads