How to Perform a Chi-Square Goodness of Fit Test in Python

Last Updated : 20 Feb, 2022

In this article, we are going to see how to Perform a Chi-Square Goodness of Fit Test in Python

The Chi-Square Goodness of fit test is a non-parametric statistical hypothesis test that’s used to determine how considerably the observed value of an event differs from the expected value. it helps us check whether a variable comes from a certain distribution or if a sample represents a population. The observed probability distribution is compared with the expected probability distribution.

null hypothesis: A variable has a predetermined distribution.

Alternative hypotheses: A variable deviates from the expected distribution.

Example 1: Using stats.chisquare() function

In this approach we use stats.chisquare() method from the scipy.stats module which helps us determine chi-square goodness of fit statistic and p-value.

Syntax: stats.chisquare(f_obs, f_exp)

parameters:

f_obs : this parameter contains an array of observed values.

f_exp : this parameter contains an array of expected values.

In the below example we also use the stats.ppf() method which takes the parameters level of significance and degrees of freedom as input and gives us the value of chi-square critical value. if chi_square_ value > critical value, the null hypothesis is rejected. if chi_square_ value <= critical value, the null hypothesis is accepted. in the below example chi_square value is 5.0127344877344875 and the critical value is 12.591587243743977. As chi_square_ value <=, critical_value null hypothesis is accepted and the alternative hypothesis is rejected.

Python3

# importing packages 
import scipy.stats as stats 
import numpy as np 
  
# no of hours a student studies 
# in a week vs expected no of hours 
observed_data = [8, 6, 10, 7, 8, 11, 9] 
expected_data = [9, 8, 11, 8, 10, 7, 6] 
  
  
# Chi-Square Goodness of Fit Test 
chi_square_test_statistic, p_value = stats.chisquare( 
    observed_data, expected_data) 
  
# chi square test statistic and p value 
print('chi_square_test_statistic is : ' +
      str(chi_square_test_statistic)) 
print('p_value : ' + str(p_value)) 
  
  
# find Chi-Square critical value 
print(stats.chi2.ppf(1-0.05, df=6)) 

Output:

chi_square_test_statistic is : 5.0127344877344875
p_value : 0.542180861413329
12.591587243743977

Example 2: Determining chi-square test statistic by implementing formula

In this approach, we directly implement the formula. we can see that we get the same values of chi_square.

Python3

# importing packages 
import scipy.stats as stats 
import numpy as np 
  
# no of hours a student studies 
# in a week vs expected no of hours 
observed_data = [8, 6, 10, 7, 8, 11, 9] 
expected_data = [9, 8, 11, 8, 10, 7, 6] 
  
  
# determining chi square goodness of fit using formula 
chi_square_test_statistic1 = 0
for i in range(len(observed_data)): 
    chi_square_test_statistic1 = chi_square_test_statistic1 + \ 
        (np.square(observed_data[i]-expected_data[i]))/expected_data[i] 
  
  
print('chi square value determined by formula : ' +
      str(chi_square_test_statistic1)) 
  
# find Chi-Square critical value 
print(stats.chi2.ppf(1-0.05, df=6)) 

Output:

chi square value determined by formula : 5.0127344877344875
12.591587243743977

Suggest improvement

How to Perform Fisher’s Exact Test in Python

Share your thoughts in the comments

How to Perform a Chi-Square Goodness of Fit Test in Python

Example 1: Using stats.chisquare() function

Python3

Example 2: Determining chi-square test statistic by implementing formula

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?