Welch’s t-Test in Python
Last Updated :
21 Feb, 2022
Welch’s t-Test: Two sample t-Test is used to compare the means of two different independent datasets. But we can apply a Two-Sample T-Test on those data groups that share the same variance. Now to compare two data groups having different variances we use Welch’s t-Test. It is regarded as the parametric equivalent of the Two-Sample T-test.
The user needs to install and import the following libraries to perform Welch’s t-Test in Python:
Syntax to install all the above packages:
pip3 install scipy numpy
Conducting Welch’s t-Test is a step by step process and these are described below,
Step 1: Import the library.
The first step is to import the libraries installed above.
Python3
import scipy.stats as stats
import numpy as np
|
Step 2: Creating data groups.
Let us consider an example, we are given two-sample data, each containing heights of 10 students of a class. We need to check whether two different class students have the same mean height. We can create data groups using numpy.array() method.
Python3
data_group1 = np.array([ 14 , 15 , 15 , 16 , 13 , 8 , 14 ,
17 , 16 , 14 , 19 , 20 , 21 , 15 ,
15 ])
data_group2 = np.array([ 36 , 37 , 44 , 27 , 24 , 28 , 27 ,
39 , 29 , 24 , 37 , 32 , 24 , 26 ,
33 ])
|
Step 3: Check the variance.
Before actually conducting Welch’s t-Test we need to find if the given data groups have the same variance. If the ratio of the larger data groups to the small data group is greater than 4:1 then we can consider that the given data groups have unequal variance. To find the variance of a data group, we can use the below syntax,
Syntax:
print(np.var(data_group))
Here,
data_group: The given data group
Python3
import scipy.stats as stats
import numpy as np
data_group1 = np.array([ 14 , 15 , 15 , 16 , 13 , 8 , 14 ,
17 , 16 , 14 , 19 , 20 , 21 , 15 ,
15 ])
data_group2 = np.array([ 36 , 37 , 44 , 27 , 24 , 28 , 27 ,
39 , 29 , 24 , 37 , 32 , 24 , 26 ,
33 ])
print (np.var(data_group1), np.var(data_group2))
|
Output:
variance
Here, the ratio is greater than 4: 1 hence the variance is different. So, we can apply Welch’s t-test.
Step 4: Conducting Welch’s t-Test.
Syntax:
ttest_ind(data_group1, data_group2, equal_var= False)
Here,
data_group1: First data group
data_group2: Second data group
equal_var = “False”: The Welch’s t-test will be conducted by not taking into consideration the equal population variances.
Example:
Python3
import scipy.stats as stats
import numpy as np
data_group1 = np.array([ 14 , 15 , 15 , 16 , 13 , 8 , 14 ,
17 , 16 , 14 , 19 , 20 , 21 , 15 ,
15 ])
data_group2 = np.array([ 36 , 37 , 44 , 27 , 24 , 28 , 27 ,
39 , 29 , 24 , 37 , 32 , 24 , 26 ,
33 ])
print (stats.ttest_ind(data_group1, data_group2, equal_var = False ))
|
Output:
Welch’s t-Test
Interpretation of the Output:
The test statistic turns out to be -8.658 and the corresponding p-value is 2.757e-08. Here the p-value is less than 0.05 hence we could reject the null hypothesis of the test and the conclusion that the difference between the mean exam score of both types of students is quite significant.
Share your thoughts in the comments
Please Login to comment...