Open In App

How to Perform a Chi-Square Goodness of Fit Test in Python

Last Updated : 20 Feb, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to Perform a Chi-Square Goodness of Fit Test in Python

The Chi-Square Goodness of fit test is a non-parametric statistical hypothesis test that’s used to determine how considerably the observed value of an event differs from the expected value. it helps us check whether a variable comes from a certain distribution or if a sample represents a population. The observed probability distribution is compared with the expected probability distribution. 

null hypothesis:  A variable has a predetermined distribution.

Alternative hypotheses: A variable deviates from the expected distribution.

Example 1: Using stats.chisquare() function

In this approach we use stats.chisquare() method from the scipy.stats module which helps us determine chi-square goodness of fit statistic and p-value. 

Syntax: stats.chisquare(f_obs, f_exp)

parameters:

  • f_obs : this parameter contains an array of observed values.
  • f_exp : this parameter contains an array of expected values.

In the below example we also use the stats.ppf() method which takes the parameters level of significance and degrees of freedom as input and gives us the value of chi-square critical value. if chi_square_ value > critical value, the null hypothesis is rejected. if chi_square_ value <= critical value, the null hypothesis is accepted. in the below example chi_square value is 5.0127344877344875 and the critical value is 12.591587243743977. As chi_square_ value <=, critical_value null hypothesis is accepted and the alternative hypothesis is rejected.

Python3




# importing packages
import scipy.stats as stats
import numpy as np
  
# no of hours a student studies
# in a week vs expected no of hours
observed_data = [8, 6, 10, 7, 8, 11, 9]
expected_data = [9, 8, 11, 8, 10, 7, 6]
  
  
# Chi-Square Goodness of Fit Test
chi_square_test_statistic, p_value = stats.chisquare(
    observed_data, expected_data)
  
# chi square test statistic and p value
print('chi_square_test_statistic is : ' +
      str(chi_square_test_statistic))
print('p_value : ' + str(p_value))
  
  
# find Chi-Square critical value
print(stats.chi2.ppf(1-0.05, df=6))


Output:

chi_square_test_statistic is : 5.0127344877344875
p_value : 0.542180861413329
12.591587243743977

Example 2: Determining chi-square test statistic by implementing formula

In this approach, we directly implement the formula. we can see that we get the same values of chi_square. 

Python3




# importing packages
import scipy.stats as stats
import numpy as np
  
# no of hours a student studies
# in a week vs expected no of hours
observed_data = [8, 6, 10, 7, 8, 11, 9]
expected_data = [9, 8, 11, 8, 10, 7, 6]
  
  
# determining chi square goodness of fit using formula
chi_square_test_statistic1 = 0
for i in range(len(observed_data)):
    chi_square_test_statistic1 = chi_square_test_statistic1 + \
        (np.square(observed_data[i]-expected_data[i]))/expected_data[i]
  
  
print('chi square value determined by formula : ' +
      str(chi_square_test_statistic1))
  
# find Chi-Square critical value
print(stats.chi2.ppf(1-0.05, df=6))


Output:

chi square value determined by formula : 5.0127344877344875
12.591587243743977


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads