Skip to content
Related Articles

Related Articles

Chi-Square Test in R
  • Last Updated : 30 Jun, 2020

The chi-square test of independence evaluates whether there is an association between the categories of the two variables. There are basically two types of random variables and they yield two types of data: numerical and categorical. Chi-square statistics is used to investigate whether distributions of categorical variables differ from one another. Chi-square test is also useful while comparing the tallies or counts of categorical responses between two(or more) independent groups.

In R, the function used for performing a chi-square test is chisq.test().

Syntax:
chisq.test(data)

Parameters:
data: data is a table containing count values of the variables in the table.

Example
We will take the survey data in the MASS library which represents the data from a survey conducted on students.






# load the MASS package
library(MASS)        
print(str(survey))

Output:

'data.frame':    237 obs. of  12 variables:
 $ Sex   : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 1 2 1 2 2 ...
 $ Wr.Hnd: num  18.5 19.5 18 18.8 20 18 17.7 17 20 18.5 ...
 $ NW.Hnd: num  18 20.5 13.3 18.9 20 17.7 17.7 17.3 19.5 18.5 ...
 $ W.Hnd : Factor w/ 2 levels "Left","Right": 2 1 2 2 2 2 2 2 2 2 ...
 $ Fold  : Factor w/ 3 levels "L on R","Neither",..: 3 3 1 3 2 1 1 3 3 3 ...
 $ Pulse : int  92 104 87 NA 35 64 83 74 72 90 ...
 $ Clap  : Factor w/ 3 levels "Left","Neither",..: 1 1 2 2 3 3 3 3 3 3 ...
 $ Exer  : Factor w/ 3 levels "Freq","None",..: 3 2 2 2 3 3 1 1 3 3 ...
 $ Smoke : Factor w/ 4 levels "Heavy","Never",..: 2 4 3 2 2 2 2 2 2 2 ...
 $ Height: num  173 178 NA 160 165 ...
 $ M.I   : Factor w/ 2 levels "Imperial","Metric": 2 1 NA 2 2 1 1 2 2 2 ...
 $ Age   : num  18.2 17.6 16.9 20.3 23.7 ...
NULL

The above result shows the dataset has many Factor variables which can be considered as categorical variables. For our model, we will consider the variables “Exer” and “Smoke“.The Smoke column records the students smoking habits while the Exer column records their exercise level. Our aim is to test the hypothesis whether the students smoking habit is independent of their exercise level at .05 significance level.




# Create a data frame from the main data set.
stu_data = data.frame(survey$Smoke,survey$Exer)
  
# Create a contingency table with the needed variables.           
stu_data = table(survey$Smoke,survey$Exer) 
                  
print(stu_data)

Output:

         Freq None Some
  Heavy    7    1    3
  Never   87   18   84
  Occas   12    3    4
  Regul    9    1    7

And finally we apply the chisq.test() function to the contingency table stu_data.




# applying chisq.test() function
print(chisq.test(stu_data))

Output:

       Pearson's Chi-squared test

data:  stu_data
X-squared = 5.4885, df = 6, p-value = 0.4828

As the p-value 0.4828 is greater than the .05, we conclude that the smoking habit is independent of the exercise level of the student and hence there is a weak or no correlation between the two variables.

The complete R code is given below.




# R program to illustrate
# Chi-Square Test in R
  
library(MASS)
print(str(survey))
  
stu_data = data.frame(survey$Smoke,survey$Exer)           
stu_data = table(survey$Smoke,survey$Exer)                
print(stu_data)
  
print(chisq.test(stu_data))

So, in summary, it can be said that it is very easy to perform a Chi-square test using R. One can perform this task using chisq.test() function in R.

My Personal Notes arrow_drop_up
Recommended Articles
Page :