Open In App

How to count values per level in a factor in R

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to count the values per level in a given factor in R Programming Language.

Method 1 : Using summary() method

summary() method in base R is a generic function used to produce result summaries of the results of the functions computed based on the class of the argument passed. The summary() function produces an output of the frequencies of the values per level of the given factor column of the data frame in R. A summary statistics for each of the variables of this column is result in a tabular format, as an output. The output is concise and clear to be easily understood. 

Example:

R




set.seed(1)
  
# creating data 
data_frame <- data.frame(col1 = sample(letters,50,rep=TRUE))
  
# count of variables
summary(data_frame$col1)


Output:

a b c d e f g h i j k l n o r s t u v w y z  

3 2 1 1 3 3 2 1 2 5 1 3 3 3 1 1 3 3 1 2 5 1 

Method 2 : Using lapply() method

The plyr package in R is used to simulate data enhancements and manipulations and can be installed into the working space.

lapply() method in R returns a list of the same length as that of the input vector where each element is the result of application of the function specified to that corresponding element. This method takes as input the dataframe or lists, and returns list as the output. 

Syntax:

lapply(vec, FUN)

Parameters : 

vec – The atomic factor type vector to apply the function on

FUN – The function to be applied, in this case, which is equivalent to count, to return the frequency of factor levels.

The output returns a list where first component is the factor level and second column is the frequency of that level. A row number is appended to the beginning of each output row. 

R




# importing required libraries
library ("plyr")
  
set.seed(1)
  
# creating data 
data_frame <- data.frame(col1 = sample(
  letters,50,rep=TRUE))
  
# counting frequencies of factor
# levels
lapply(data_frame, count)


Output

$col1    

   x freq 

1  a    3 

2  b    2 

3  c    1 

4  d    1 

5  e    3 

6  f    3 

7  g    2 

8  h    1 

9  i    2 

10 j    5 

11 k    1 

12 l    3 

13 n    3

14 o    3 

15 r    1 

16 s    1 

17 t    3 

18 u    3 

19 v    1 

20 w    2 

21 y    5 

22 z    1

Method 3 : Using data.table package

The data.table package in R is used to work with tables to access, manipulate and store data. 

Initially, the data frame is converted to data.table by reference using the setDT() command. This method is very useful while working with large data sets and more observations. 

Syntax:

setDT(df)

The keyby attribute is applied over the required column name in order to group the data contained in it. As an index, the .N parameter is used in place of columns to access the number of instances of each particular factor level. The output is a frequency table. The output is returned in the form of a data.table where row begins with a row number followed by colon. 

Example:

R




# importing required libraries
library(data.table)
  
set.seed(1)
  
# creating data 
data_frame <- data.frame(col1 = sample(
  letters,50,rep=TRUE))
  
# counting frequencies of factor
# levels
setDT(data_frame)[, .N, keyby=col1]


Output

  col1 N  

1:    a 3  

2:    b 2  

3:    c 1  

4:    d 1  

5:    e 3  

6:    f 3  

7:    g 2  

8:    h 1  

9:    i 2 

10:    j 5 

11:    k 1 

12:    l 3 

13:    n 3 

14:    o 3 

15:    r 1 

16:    s 1 

17:    t 3 

18:    u 3 

19:    v 1 

20:    w 2 

21:    y 5 

22:    z 1     

Method 4 : Using dplyr package

The “dplyr” package is an enhancement of the plyr package which provides a wide range of selection and filter operations to be performed over the data elements. It can be loaded and installed into the working space.

The group_by() method in the package is first used to group the data into different categories depending on the different values encountered. The rows belonging to a single value are stacked together. The tally() function behaves similarly to the summarise() function and is used to generate summaries according to the groups made. 

Syntax:

df %>% group_by() %>% tally()

The output returned is in the form of a tibble, which contains rows equivalent to the length of the input vector. The columns contain information about the frequencies of the factor level encountered. This method gives a clear insight into the column types and dimensions of the returned output. However, only ten rows are displayed by default, which can be expanded further to view others.

Example:

R




# importing required libraries
library ("dplyr")
  
set.seed(1)
  
# creating data 
data_frame <- data.frame(col1 = sample(
  letters,50,rep=TRUE))
  
# counting frequencies of factor
# levels
data_frame %>% 
  group_by(col1) %>% 
  tally()


Output

# A tibble: 22 x 2    

col1      n    

<fct> <int>  

1 a         3 

2 b         2  

3 c         1  

4 d         1  

5 e         3  

6 f         3  

7 g         2  

8 h         1  

9 i         2 

10 j         5 

# … with 12 more rows



Last Updated : 26 May, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads