Open In App

Extend Contingency Table with Proportions and Percentages in R

Last Updated : 06 Jun, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

The data.table in R programming language can be used to store different cells containing values each belonging to a similar set of groups or mutually exclusive groups. The counts of the variables w.r.t their groups can be computed using base methods as well as external packages in R.

Creating contingency or frequency table in R

The table() method in R is used to compute the frequency counts of the variables appearing in the specified column of the data frame. The result is returned to the form of a two-row tabular structure, where the first row indicates the value of the column and next indicates its corresponding frequencies. table() function is also helpful in creating Frequency tables with condition and cross-tabulations. The frequency table is also referred to as the contingency table in R. The table() method is applied over the data.table object and the unique combinations of values of the specified columns is returned along with their respective frequency counts. 

Syntax:

table (x), 

Where x is the data.table object

In case, x is a data frame, the frequency table can be constructed using the following method.

Syntax:

table (data_frame$col-name)

Example:

R




library("data.table")
  
set.seed(1)  
  
# creating a data frame
data_table <- data.table(col1 =  sample(letters[1:3], 8, replace = TRUE) ,
                         col2 = sample(1:6, 8, replace = TRUE)
                        )
  
print ("Original DataFrame")
print (data_table)
  
# calculating frequency
freq <- table(data_table$col1)
print ("Frequency")
print (freq)


Output

[1] "Original DataFrame" 
col1 col2 
1:    a    2 
2:    c    3 
3:    a    3 
4:    b    1 
5:    a    5 
6:    c    5 
7:    c    2 
8:    b    6 
[1] "Frequency"
a b c  
3 2 3 

 Creating proportions of the frequency table

Relative frequency also known as the probability distribution, is the frequency of the corresponding value divided by the total number of elements. This can be calculated by either prop.table() method applied over the frequency table obtained from the previous approach. It refers to as proportions, since it returns the proportion of each component among the total number of components. 

Syntax:

prop.table(frq-table)

frq-table / total observations

The proper syntax to compute the proportion table is as follows : 

prop.table (table(df$col-name))

Example:

R




library("data.table")
  
set.seed(1)  
  
# creating a data frame
data_table <- data.table(col1 =  sample(letters[1:3], 8, replace = TRUE) ,
                         col2 = sample(1:6, 8, replace = TRUE)
                        )
  
print ("Original DataFrame")
print (data_table)
  
# calculating frequency
freq <- table(data_table$col1)
  
# creating proportions 
prop <- prop.table(freq)
print ("Proportions of column1")
print (prop)


Output

[1] "Original DataFrame"  
col1 col2
1:    a    2
2:    c    3
3:    a    3
4:    b    1
5:    a    5
6:    c    5
7:    c    2
8:    b    6
[1] "Proportions of column1"
a     b     c  
0.375 0.250 0.375

Creating percentages of the frequency table

The percentages can be calculated by multiplying each of the corresponding cell elements of the probability table by 100. The result can be rounded off to any number of digits using the round() method for better readability. 

Syntax:

round (num , digits)

The proportions table can be rounded off to compute percentages by multiplying each cell value by 100. The result is the data.table or vector in the same format as the input. 

Example:

R




library("data.table")
set.seed(1)  
  
# creating a data frame
data_table <- data.table(col1 =  sample(letters[1:3], 8, replace = TRUE) ,
                         col2 = sample(1:6, 8, replace = TRUE)
                        )
  
print ("Original DataFrame")
print (data_table)
  
# calculating frequency
freq <- table(data_table$col1)
  
# creating proportions 
prop <- prop.table(freq)
  
print ("Percentage of column1")
perc <- round((prop * 100),2)
print (perc)


Output

[1] "Original DataFrame"  
col1 col2 
1:    a    2 
2:    c    3 
3:    a    3 
4:    b    1 
5:    a    5 
6:    c    5 
7:    c    2 
8:    b    6 
[1] "Percentage of column1"  
a    b    c  
37.5 25.0 37.5 

Computing frequency, proportion and percentages using multiple columns

The table() method can be used to specify multiple column arguments, where the unique combinations are computed along with their respective counts

Example:

R




library("data.table")
set.seed(1)  
  
# creating a data frame
data_table <- data.table(col1 =  sample(letters[1:3], 8, replace = TRUE) ,
                         col2 =  sample(1:2, 8, replace = TRUE)
                        )
  
print ("Original DataFrame")
print (data_table)
  
# calculating frequency
freq <- table(data_table$col1,data_table$col2)
  
# creating proportions 
prop <- prop.table(freq)
print ("Proportions of column1")
print (prop)
  
print ("Percentage of column1")
perc <- round((prop * 100),2)
print (perc)


Output

[1] "Original DataFrame" 
   col1 col2 
1:    a    2 
2:    c    1 
3:    a    1 
4:    b    1 
5:    a    1 
6:    c    1 
7:    c    2 
8:    b    2 
[1] "Proportions of column1" 
      1     2   
a 0.250 0.125   
b 0.125 0.125   
c 0.250 0.125
[1] "Percentage of column1" 
     1    2   
a 25.0 12.5   
b 12.5 12.5   
c 25.0 12.5


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads