Open In App

T-test

In statistics, various tests are used to compare different samples or groups and draw conclusions about populations. These tests, known as statistical tests, focus on analyzing the likelihood or probability of obtaining the observed data under specific assumptions or hypotheses. They provide a framework for assessing evidence in support of or against a particular hypothesis.

A statistical test begins by formulating a null hypothesis (H0) and an alternative hypothesis (Ha). The null hypothesis represents the default assumption, typically stating no effect or no difference, while the alternative hypothesis suggests a specific relationship or effect.

Different statistical test methods are available to calculate the probability, typically measured as a p-value, of obtaining the observed data. The p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true. Researchers compare the calculated p-value to a predetermined significance level, often denoted as α, to make a decision regarding the null hypothesis. If the p-value is smaller than α, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis.

There are different-different statistical tests like Z-test, T-tests, Chi-squared tests, ANOVA, Z-test, and F-test, etc. which are used to compute the p-value. In this article, we will learn about the T-test.

T-Test

The t-test is named after William Sealy Gosset’s Student’s t-distribution, which was created while he was writing under the pen name “Student.” A mathematical distribution known as the t-distribution resembles the normal distribution but has thicker tails. It is employed in statistical inference, especially when there is a limited sample size or when the population standard deviation is unknown.

A t-test is a type of inferential statistic test used to determine if there is a significant difference between the means of two groups. It is often used when data is normally distributed and population variance is unknown. The t-test is used in hypothesis testing to assess whether the observed difference between the means of the two groups is statistically significant or just due to random variation.

Key terms in t-Test

The most used key terms in T-test are as follows:

• T-statistic: The t-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score
• If the t-value is large => the two groups belong to different groups.
• If the t-value is small => the two groups belong to the same group.
• T-Distribution: The t-distribution, commonly known as the Student’s t-distribution, is a probability distribution with tails that are thicker than those of the normal distribution. It is employed in statistical inference when working with small sample sizes and population standard deviations are unknown. The t-distribution gets closer to the normal distribution as the sample size rises.  It plays a crucial role in hypothesis testing and estimating population parameters with limited data.

T-Table

• Degree of freedom (df): The degree of freedom represents the number of values in a calculation that is free to vary. The degree of freedom (df)  tells us the number of independent variables used for calculating the estimate between 2 sample groups.
In a t-test, the degree of freedom is calculated as the total sample size minus 1 i.e , where “ns” is the number of observations in the sample. It reflects the number of values in the sample that are free to vary after estimating the sample mean.
Suppose, we have 2 samples A and B. The df would be calculated as df = (nA-1) + (nB -1)
• Significance level (α): It is the probability of rejecting the null hypothesis when it is true. In simpler terms, it tells us about the percentage of risk involved in saying that a difference exists between two groups when in reality it does not.

Types of t-tests

There are three types of t-tests, and they are categorized as dependent and independent t-tests.

1. Independent samples t-test: compares the means for two groups.
2. Paired sample t-test: compares means from the same group at different times (say, one year apart).
3. One sample t-test test: the mean of a single group against a known mean.

1. Independent sample t-test

An Independent sample t-test, commonly known as an unpaired sample t-test is used to find out if the differences found between two groups is actually significant or just a random occurrence.

We can use this when:

• the population mean or standard deviation is unknown. (information about the population is unknown)
• the two samples are separate/independent. For eg. boys and girls (the two are independent of each other)

Formula used:

where,t = t-value A = Sample of AB = Sample of BμA = Mean of sample AμB = Mean of sample BnA = samele size of A  nB = sample size of B df = degree of freedom

Steps involved

Step 1 - Find the sum of all values in each sample. Step 2 - Square the sum values found in step 1.Step 3 - Find the sum of square of individual values in each sample.Step 4 - Calculate the mean of each sample.Step 5 - Find the degree of freedom (df) using df = (nA-1) + (nB -1).Step 6 - Insert all the values found in Steps 1-4 into above Independent sample t-test formula to and         find the calculated t-value.Step 7 - Use the values of df and α (take α = 0.05 if not given) in the above t-table image in         two-tails to find the table value of t.Step 8 - Compare values of t found in Step-6 and Step-7.

Interpreting the results

If tcal > ttable => p < (α=0.05) => significant difference between two groups found.If tcal < ttable => p > (α=0.05) => no significant difference between two groups.

Example Problem (Step by Step)

Suppose, two independent sample data A and B are given, with the following values. We have to perform the Independent samples t-test for this data.

Sample A

Sample B

1

1

2

2

4

2

4

3

5

3

5

4

6

5

7

6

8

7

8

7

Step 1 - ∑A = 1 + 2 + 4 + 4 + 5 + 5 + 6 + 7 + 8 + 8 = 50∑B = 1 + 2 + 2 + 3 + 3 + 4 + 5 + 6 + 7 + 7 = 40
Step 2 -(∑A)2 = (50)2 = 2500(∑B)2 =    (40)2 = 1600
Step 3 -∑A2 = 12 + 22 + 42 + 42 + 52 + 52 + 62 + 72 + 82 + 82 = 300∑B2 = 12 + 22 + 22 + 32 + 32 + 42 + 52 + 62 + 72 + 72 = 202
Step 4 -n = 10μA = (∑A / n) = 50/10 = 5μB = (∑B / n) = 40/10 = 4
Step 5 - df = (nA - 1) + (nB - 1) = (10-1) + (10-1) = 18 [using Eq-2]
Step 6 - Putting values found in above Independent sample t-test formula to         find the calculated value of t.        we get, tcal = 0.99
Step 7 - Let value of α = 0.05 and df = 18. Looking up the two-tailed t-table.         we get, ttable = 2.10
(df)/(α)0.20.100.05. .

1.2821.6451.960. .

1

3.0786.31412.706. .

2

1.8862.9204.303. .

:

:::. .

8

1.3971.8602.306. .

9

1.3831.8332.262. .

:

:

:

:

. .

18

1.330

1.734

2.101

. .

19

1.328

1.729

2.093

. .

20

1.325

1.725

2.086

. .

:

:

:

:

. .
Step 8 - 0.99 < 2.10 (tcal < ttable by 1.11)=> no significant difference found between two groups.

Python3

 # import the necessary librariesfrom scipy import statsimport numpy as np # Samplesample_A = np.array([1,2,4,4,5,5,6,7,8,8])sample_B = np.array([1,2,2,3,3,4,5,6,7,7]) # Perform independent sample t-testt_statistic, p_value = stats.ttest_ind(sample_A, sample_B) # Set the significance level (alpha)alpha = 0.05 # Compute the degrees of freedom (df) (n_A-1)+(n_b-1)df = len(sample_A)+len(sample_B)-2 # Calculate the critical t-value# ppf is used to find the critical t-value for a two-tailed testcritical_t = stats.t.ppf(1 - alpha/2, df)  # Print the resultsprint("T-value:", t_statistic)print("P-Value:", p_value)print("Critical t-value:", critical_t) # Decisionprint('With T-value')if np.abs(t_statistic) >critical_t:    print('There is significant difference between two groups')else:    print('No significant difference found between two groups') print('With P-value')if p_value >alpha:    print('No evidence to reject the null hypothesis that a significant difference between the two groups')else:    print('Evidence found to reject the null hypothesis that a significant difference between the two groups')

Output:

T-value: 0.9890707100936805P-Value: 0.33573862223613105Critical t-value: 2.10092204024096With T-valueNo significant difference found between two groupsWith P-valueNo evidence to reject the null hypothesis that a significant difference between the two groups

2. Paired sample t-test

Paired sample t-test, commonly known as dependent sample t-test is used to find out if the difference in the mean of two samples is 0. The test is done on dependent samples, usually focusing on a particular group of people or things. In this, each entity is measured twice, resulting in a pair of observations.

We can use this when:

• Two similar (twin like) samples are given. [Eg, Scores obtained in English and Math (both subjects)]
• The dependent variable (data) is continuous.
• The observations are independent of one another.
• The dependent variable is approximately normally distributed.

Formula Used

where, t = t-valueD = difference between the two samples (A-B)N = sample size (same as n)

Steps Involved

Step 1 - Find the sum of difference of each two samples in data. [∑D = ∑(A-B)]Step 2 - Find the sum of square of each D found in Step 1. [(∑D2)]Step 3 - Find the square of summation of D. [(∑D)2]Step 4 - Put the values found from Steps 1-3 in above Paired sample t-test formula to and         find the t-value.Step 5 - Find the degree of freedom (df) using df = n-1.

NOTE :  Here, df is calculated as a whole for the data, not for each individual sample set. This is because the two samples A and B are twin like. (similar)

So, df = ∑(nS – 1) = N-1

Step 6 - Use the values of df and α (take α = 0.05 if not given) in the above t-table          in two-tails to find the table value of t. Step 7 - Compare values of t found in Step-4 and Step-6.

Interpretation of Results

Same as that of the Independent samples t-test.

Example Problem (Step by Step)

Consider the following example. Scores (out of 25) of the subjects Math1 and Math2 are taken for a sample of 10 students. We have to perform the paired sample t-test for this data.

Student no.

Math1

Math2

Step 1
(D)

Step 2
(∑D2)

1415

-11

121
2416

-12

144
3714

-7

49
41614

2

4
52022

-2

4
61122

-11

121
71323

-10

100
8918

-9

81
91118

-7

49
101519

-4

16
Sum –   (∑D) = -71∑D2 = 689
Step 1 and Step 2 - as shown in table above.
Step 3 - (∑D)2 = (71)2 = 5041
Step 4 - Putting values in in above Paired sample t-test formula, we get     tcal = -4.96      Here we, will consider the abosolute value so,      tcal = 4.96
Step 5 - df = n -1 = 10 - 1 = 9
Step 6 - Using df = 9 and α = 0.05 in table. We get,     ttable = 2.26
Step 7 - 4.96 > 2.26 (tcal > ttable by 7.22)=> There is significant difference between math1 and math2

Python3

 # import the necessary librariesfrom scipy import statsimport numpy as np # Create the paired samplesmath1 = np.array([4, 4, 7, 16, 20, 11, 13, 9, 11, 15])math2 = np.array([15, 16, 14, 14, 22, 22, 23, 18, 18, 19]) # Perform the paired sample t-testt_statistic, p_value = stats.ttest_rel(math1, math2) # Set the significance level (alpha)alpha = 0.05 # Compute the degrees of freedom (df=n-1)df = len(math2)-1 # Calculate the critical t-value# ppf is used to find the critical t-value for a two-tailed testcritical_t = stats.t.ppf(1 - alpha/2, df) # Print the resultsprint("T-value:", t_statistic)print("P-Value:", p_value)print("Critical t-value:", critical_t) # Decisionprint('With T-value')if np.abs(t_statistic) >critical_t:    print('There is significant difference between math1 and math2')else:    print('No significant difference found between math1 and math2') print('With P-value')if p_value >alpha:    print('No evidence to reject the null hypothesis that significant difference between math1 and math2')else:    print('Evidence found to reject the null hypothesis that significant difference between math1 and math2')

Output:

T-value: -4.953488372093023P-Value: 0.0007875235561560145Critical t-value: 2.2621571627409915With T-valueThere is significant difference between math1 and math2With P-valueEvidence found to reject the null hypothesis that significant difference between math1 and math2

3. One sample t-test

One sample t-test is one of the widely used t-tests for comparison of the sample mean of the data to a particularly given value. Used for comparing the sample mean to the true/population mean.

We can use this when:

the sample size is small. (under 30) data is collected randomly. data is approximately normally distributed.

Formula used:

where,t = t-valuex_bar = sample meanμ = true/population meanσ = standard deviationn = sample size

Steps involved

Step 1 - Define the null (h0) and alternative (h1) hypothesis.Step 2 - Calculate sample mean. (if not given)      [population mean, standard deviation, n is given]Step 3 - Put the values found in Step 1 into above formula of One sample t-test and calculate t-value. (tcal)Step 4 - Calculate degree of freedom (df). (same as done in paired sample t-test)Step 5 - Take α = 0.05 if not given. Use the value of df and α and find ttable from above t-table         in one tailed.Step 6 - Compare values of t found in Step-3 and Step-5.

Interpretation of Results

Same as that of the Independent samples t-test.

Example Problem (Step by Step)

Consider the following example. The weights of 25 obese people were taken before enrolling them into the nutrition camp. The population mean weight is found to be 45 kg before starting the camp. After finishing the camp, for the same 25 people, the sample mean was found to be 75 with a standard deviation of 25. Did the fitness camp work?

Step 1 - h0 -> μ = 45 (sample mean is true mean)      h1 -> μ ≠ 45 (sample mean is not true mean)
Step 2 - Given,      x_bar = 75      μ = 45      σ = 25      n = 25
Step 3 - Putting the values from Step 2 in above formula of One sample t-test. we get,     tcal = 6
Step 4 - df = n - 1 = 24
Step 5 - Using df = 24 and α = 0.05 in table. We get,     ttable = 1.711
Step 6 - 6 > 1.711 (tcal > ttable)=> significant difference found between two groups.=> the nutrition camp significantly impacted the weights and it was a success.

Python3

 import scipy.stats as statsimport numpy as np # Define the population mean weightpopulation_mean = 45 # Define the sample mean weight and standard deviationsample_mean = 75sample_std = 25 # Define the sample sizesample_size = 25 # Calculate the t-statistict_statistic = (sample_mean - population_mean) / (sample_std / np.sqrt(sample_size)) # Define the degrees of freedomdf = sample_size - 1 # Set the significance level (alpha)alpha = 0.05 # Calculate the critical t-valuecritical_t = stats.t.ppf(1 - alpha, df) # Calculate the p-valuep_value = 1 - stats.t.cdf(t_statistic, df) # Print the resultsprint("T-Statistic:", t_statistic)print("Critical t-value:", critical_t)print("P-Value:", p_value) # Decisionprint('With T-value :')if t_statistic > critical_t:    print("""There is a significant difference in weight before and after the camp.    The fitness camp had an effect.""")else:    print("""There is no significant difference in weight before and after the camp.    The fitness camp did not have a significant effect.""") print('With P-value :')if p_value >alpha:    print("""There is a significant difference in weight before and after the camp.    The fitness camp had an effect.""")else:    print("""There is no significant difference in weight before and after the camp.    The fitness camp did not have a significant effect.""")

Output:

T-Statistic: 6.0Critical t-value: 1.7108820799094275P-Value: 1.703654035845048e-06With T-value :There is a significant difference in weight before and after the camp.     The fitness camp had an effect.With P-value :There is no significant difference in weight before and after the camp.     The fitness camp did not have a significant effect.

The above-discussed types of t-tests are widely used in the fields of research in hospitals by experts to gain important information about the medical data given to them about the effects of various medicines and drugs on the population and help them draw out important inferences regarding the same. However, it is the responsibility of the person to see to it that which t-test would bring out the best results and that all the assumptions of that t-test are adhered to. For any doubt/query, comment below.