Skip to content
Related Articles

Related Articles

Improve Article

Count number of rows within each group in R DataFrame

  • Last Updated : 30 May, 2021

DataFrame in R Programming Language may contain columns where not all values are unique. The duplicate values in the dataframe can be sectioned together into one group. The frequencies corresponding to the same columns’ sequence can be captured using various external packages in R programming language.

Method 1 : Using dplyr package

The “dplyr” package in R is used to perform data enhancements and manipulations. We can use certain functions from this method that can help to realize our functionality.

  • Using tally() and group_by() method

group_by() method in R can be used to categorize data into groups based on either a single column or a group of multiple columns. All the plausible unique combinations of the input columns are stacked together as a single group. 

Syntax:

group_by(args .. )



Where, the args contain a sequence of column to group data upon

The tally() method in R is used to summarize the data and count the number of values that each group belongs to. Upon successive application of these methods, the dataframe mutations are carried out to return a table where the particular input columns are returned in order of their appearance in the group_by() method, followed by a column ‘n’ containing frequency counts for these groups. 

This method is considered to be better than other approaches because it returns detailed information about the column classes of the specified dataframe. 

Example:

R




library("dplyr")
  
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
                         col2 = letters[1:3])
  
print ("Original DataFrame")
print (data_frame)
  
# group by column1 values and count
# the total in each
data_frame %>% group_by(col1) %>%tally()

Output

[1] "Original DataFrame" 
   col1 col2 
1    1    a 
2    1    b 
3    1    c 
4    2    a 
5    2    b 
6    2    c 
7    3    a 
8    3    b 
9    3    c > 
# A tibble: 3 x 2    
col1     n   
<int> <int> 
1     1     3 
2     2     3 
3     3     3
  • Using dplyr::count() method

The count() method can be applied to the input dataframe containing one or more columns and returns a frequency count corresponding to each of the groups. The columns returned on the application of this method is a proper subset of the columns of the original dataframe. The columns appearing in the result are the columns appearing in the count() method. 

Syntax:



count(args .. ), 

Where, the args contain a sequence of column to group data upon

Example:

R




library("dplyr")
  
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
                         col2 = letters[1:3],
                         col3 = c(1,4,1,2,2,3,1,2,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Modified DataFrame")
  
# count rows by col1 and col3 group
data_frame %>% dplyr::count(col1, col3)

Output:

[1] "Original DataFrame"  
   col1 col2 col3 
1    1    a    1 
2    1    b    4 
3    1    c    1 
4    2    a    2 
5    2    b    2 
6    2    c    3 
7    3    a    1 
8    3    b    2 
9    3    c    2 
[1] "Modified DataFrame" 
   col1 col3 n 
1    1    1  2 
2    1    4  1 
3    2    2  2 
4    2    3  1 
5    3    1  1 
6    3    2  2

Method 2 : Using data.table package

The data.table package in R can be used to retrieve and store data in an organized tabular structure. The .N attribute of the data_table indexing can be used to categorically keep a count of the frequency of the encountered specified columns’ combinations. The columns are specified in the “by” attribute using the list() method in R, which is an alternative to the group_by() method. 

Syntax:

data_table[, .N, by = list(cols..)]

Example:

R






library(data.table)
  
# creating a dataframe
data_frame <- data.frame(col1 = rep(c(1:3), each = 3),
                         col2 = letters[1:3],
                         col3 = c(1,4,1,2,2,3,1,2,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("Modified DataFrame")
data_table <- data.table(data_frame)
data_table[, .N, by = list(col1, col3)]

Output

[1] "Original DataFrame" 
   col1 col2 col3 
1    1    a    1 
2    1    b    4 
3    1    c    1 
4    2    a    2 
5    2    b    2 
6    2    c    3 
7    3    a    1 
8    3    b    2 
9    3    c    2
[1] "Modified DataFrame" 
   col1 col3 N 
1:    1    1 2 
2:    1    4 1 
3:    2    2 2 
4:    2    3 1 
5:    3    1 1 
6:    3    2 2

Method 3 : Using aggregate method

aggregate() method in R programming language is a generic function used to summarize and evaluate both time series as well dataframes. 

Syntax:

aggregate(formula, data, FUN)

Parameter : 

  • formula  : such as y ~ x  where the y variables are numeric data to be split into groups according to the grouping x variables.
  • by – grouping elements 
  • FUN – function to be applied

The function to be applied here is the length, which counts the frequency associated with each group. It computes the plausible combinations of all the columns mentioned in the formula, and displays each one with a frequency associated. Thus, it is used to perform an aggregation over all the columns.

Example:

R




data_frame <- data.frame(col1 = sample(1:2,9,replace = TRUE),
                         col2 = letters[1:3],
                         col3 = c(1,4,1,2,2,3,1,2,2))
  
print ("Original DataFrame")
print (data_frame)
  
print ("keeping a count of all groups")
  
data_mod <- aggregate(col3 ~ col1 + col2,
          data = data_frame,
          FUN = length)
print (data_mod)

Output

[1] "Original DataFrame" 
col1 col2 col3 
1    2    a    1 
2    2    b    4 
3    1    c    1 
4    1    a    2 
5    1    b    2 
6    2    c    3 
7    2    a    1 
8    2    b    2 
9    1    c    2 
[1] "keeping a count of all groups" 
col1 col2 col3 
1    1    a    1 
2    2    a    2 
3    1    b    1 
4    2    b    2 
5    1    c    2 
6    2    c    1



My Personal Notes arrow_drop_up
Recommended Articles
Page :