Open In App

Frequency count of multiple variables in R Dataframe

Improve
Improve
Like Article
Like
Save
Share
Report

A data frame may contain repeated or missing values. Each column may contain any number of duplicate or repeated instances of the same variable. Data statistics and analysis mostly rely on the task of computing the frequency or count of the number of instances a particular variable contains within each column and in R Programming Language, there are multiple ways to do so. 

Method 1: Using apply() method

The apply method in base R returns a vector or array or list of values obtained by applying a function to margins of an array or matrix. It has the following syntax : 

apply ( df , axis , FUN)

The table() method takes the cross-classifying factors belonging in a vector to build a contingency table of the counts at each combination of factor levels. A contingency table is basically a tabulation of the counts and/or percentages for multiple variables. It excludes the counting of any missing values from the factor variable supplied to the method. The output returned is in the form of a table. This method can be used to cross-tabulation and statistical analysis.

Example 1: Here we return column-wise for all the columns of the data frame, indicating the frequencies of the variable value instances occurring in that particular column. 

R




set.seed(1)  
# creating a data frame
data_frame <- data.frame(col1 =  sample(letters[1:3], 8,
                                        replace = TRUE),
                         col2 =  sample(letters[1:3], 8, 
                                        replace = TRUE),
                         col3 =  sample(letters[1:3], 8,
                                        replace = TRUE),
                         col4 =  sample(letters[1:3], 8, 
                                        replace = TRUE)
                        )
  
print ("Original DataFrame")
print (data_frame)
  
# calculating frequency of multiple variables
mod_frame <- apply(data_frame, 2 , table)
print ("Frequencies")
print (mod_frame)


Output:

[1] "Original DataFrame" 
col1 col2 col3 col4 
1    a    b    b    a 
2    c    c    b    b 
3    a    c    c    a 
4    b    a    a    a 
5    a    a    c    b 
6    c    a    a    b 
7    c    b    a    b 
8    b    b    a    a 
[1] "Frequencies" 
$col1

a b c 
3 2 3 

$col2  
a b c  
3 3 2  
 $col3  
a b c  
4 2 2   
$col4  
a b  
4 4 

Example 2: Only for specific columns also, by specifying the desired column names in the form of a vector and addressing them using data frame indexing df[cols]. The output is returned to the form of a table, where column headings are column names desired and row heading are the different values found. 

R




set.seed(1)  
  
# creating a data frame
data_frame <- data.frame(col1 =  sample(letters[1:3], 8,
                                        replace = TRUE) ,
                         col2 =  sample(letters[1:3], 8, 
                                        replace = TRUE),
                         col3 =  sample(letters[1:3], 8,
                                        replace = TRUE),
                         col4 =  sample(letters[1:3], 8, 
                                        replace = TRUE)
                        )
  
print ("Original DataFrame")
print (data_frame)
sel_col <- c("col1", "col3")
  
# calculating frequency of multiple variables
mod_frame <- apply(data_frame[sel_col], 2, table)
print ("Frequencies")
print (mod_frame)


Output:

[1] "Original DataFrame"
col1 col2 col3 col4
1    a    b    b    a
2    c    c    b    b
3    a    c    c    a
4    b    a    a    a
5    a    a    c    b
6    c    a    a    b
7    c    b    a    b
8    b    b    a    a 
[1] "Frequencies" 
col1 col3 
a    3    4 
b    2    2 
c    3    2

Method 2: Using plyr package

The plyr package is used preferably to experiment with the data, that is, create, modify and delete the columns of the data frame, on subjecting them to multiple conditions and user-defined functions. It can be downloaded and loaded into the workspace using the following command :

install.packages("lpyr")

The count() method of this package is used to return a frequency count of the variable contained in the specified columns respectively. It may contain multiple columns, and all the possible combinations are generated as per the cross join. The unique combinations out of the them are returned along with their respective counts. 

count (df , args..) , where args.. are the column names

The output returns only the column specified in the count() method. 

R




library("plyr")
set.seed(1)  
  
# creating a data frame
data_frame <- data.frame(col1 =  sample(letters[1:3], 8,
                                        replace = TRUE) ,
                         col2 =  sample(letters[1:3], 8,
                                        replace = TRUE),
                         col3 =  sample(letters[1:3], 8,
                                        replace = TRUE),
                         col4 =  sample(letters[1:3], 8,
                                        replace = TRUE)
)
  
print ("Original DataFrame")
print (data_frame)
sel_col <- c("col1")
  
# calculating frequency of multiple variables
mod_frame <- count(data_frame, sel_col)
print ("Frequencies")
print (mod_frame)


Output:

[1] "Original DataFrame"
col1 col2 col3 col4
1    a    b    b    a
2    c    c    b    b
3    a    c    c    a
4    b    a    a    a
5    a    a    c    b
6    c    a    a    b
7    c    b    a    b
8    b    b    a    a 
[1] "Frequencies"
  col1 freq 
1    a    3 
2    b    2 
3    c    3


Last Updated : 30 May, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads