# T-test

In statistics, various tests are used to compare different samples or groups and draw conclusions about populations. These tests, known as statistical tests, focus on analyzing the likelihood or probability of obtaining the observed data under specific assumptions or hypotheses. They provide a framework for assessing evidence in support of or against a particular hypothesis.

A statistical test begins by formulating a null hypothesis (H_{0}) and an alternative hypothesis (H_{a}). The null hypothesis represents the default assumption, typically stating no effect or no difference, while the alternative hypothesis suggests a specific relationship or effect.

Different statistical test methods are available to calculate the probability, typically measured as a p-value, of obtaining the observed data. The p-value indicates the likelihood of observing the data or more extreme results assuming the null hypothesis is true. Researchers compare the calculated p-value to a predetermined significance level, often denoted as α, to make a decision regarding the null hypothesis. If the p-value is smaller than α, the results are considered statistically significant, leading to the rejection of the null hypothesis in favor of the alternative hypothesis.

There are different-different statistical tests like Z-test, T-tests, Chi-squared tests, ANOVA, Z-test, and F-test, etc. which are used to compute the p-value. In this article, we will learn about the T-test.

## T-Test

The t-test is named after William Sealy Gosset’s Student’s t-distribution, which was created while he was writing under the pen name “Student.” A mathematical distribution known as the t-distribution resembles the normal distribution but has thicker tails. It is employed in statistical inference, especially when there is a limited sample size or when the population standard deviation is unknown.

A t-test is a type of inferential statistic test used to determine if there is a significant difference between the means of two groups. It is often used when data is normally distributed and population variance is unknown. The t-test is used in hypothesis testing to assess whether the observed difference between the means of the two groups is statistically significant or just due to random variation.

### Key terms in t-Test

The most used key terms in T-test are as follows:

The t-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score**T-statistic:**- If the t-value is large => the two groups belong to different groups.
- If the t-value is small => the two groups belong to the same group.

The t-distribution, commonly known as the Student’s t-distribution, is a probability distribution with tails that are thicker than those of the normal distribution. It is employed in statistical inference when working with small sample sizes and population standard deviations are unknown. The t-distribution gets closer to the normal distribution as the sample size rises. It plays a crucial role in hypothesis testing and estimating population parameters with limited data.**T-Distribution:**

The degree of freedom represents the number of values in a calculation that is free to vary.**Degree of freedom (df):**The degree of freedom (df) tells us the number of independent variables used for calculating the estimate between 2 sample groups.

In a t-test, the degree of freedom is calculated as the total sample size minus 1 i.e , where “n_{s}” is the number of observations in the sample. It reflects the number of values in the sample that are free to vary after estimating the sample mean.

Suppose, we have 2 samples A and B. The df would be calculated as**df = (n**_{A}**-1) + (n**_{B }**-1)**It is the probability of rejecting the null hypothesis when it is true. In simpler terms, it tells us about the percentage of risk involved in saying that a difference exists between two groups when in reality it does not.**Significance level (α):**

## Types of t-tests

There are three types of t-tests, and they are categorized as dependent and independent t-tests.

compares the means for two groups.**Independent samples t-test:**compares means from the same group at different times (say, one year apart).**Paired sample t-test:**the mean of a single group against a known mean.**One sample t-test test:**

**1. Independent sample t-test**

**1. Independent sample t-test**

An Independent sample t-test, commonly known as an unpaired sample t-test is used to find out if the differences found between two groups is actually significant or just a random occurrence.

**We can use this when:**

- the population mean or standard deviation is unknown. (information about the population is unknown)
- the two samples are separate/independent. For eg. boys and girls (the two are independent of each other)

**Formula used:**

where,t-valuet =Sample of AA =Sample of BB =μMean of sample A_{A =}μMean of sample B_{B =}nsamele size of A_{A =}nsample size of B_{B =}degree of freedomdf =

**Steps involved**

Find the sum of all values in each sample.Step 1 -Square the sum values found in step 1.Step 2 -Find the sum of square of individual values in each sample.Step 3 -Calculate the mean of each sample.Step 4 -Find the degree of freedomStep 5 -using df = (nA-1) + (nB -1).(df)Insert all the values found inStep 6 -into above Independent sample t-test formula toSteps 1-4and

find the calculated t-value.Use the values of df and α (take α = 0.05 if not given) in the above t-table image inStep 7 -

two-tails to find the table value of t.Compare values of t found inStep 8 -andStep-6.Step-7

**Interpreting the results**

Ift_{cal > ttable}_{ }=>=> significant difference between two groups found.p < (α=0.05)

Ift=>_{cal < ttable}=> no significant difference between two groups.p > (α=0.05)

**Example Problem (Step by Step)**

Suppose, two independent sample data A and B are given, with the following values. We have to perform the Independent samples t-test for this data.

Sample A | Sample B |
---|---|

1 | 1 |

2 | 2 |

4 | 2 |

4 | 3 |

5 | 3 |

5 | 4 |

6 | 5 |

7 | 6 |

8 | 7 |

8 | 7 |

Step 1 -1 + 2 + 4 + 4 + 5 + 5 + 6 + 7 + 8 + 8 = 50∑A =1 + 2 + 2 + 3 + 3 + 4 + 5 + 6 + 7 + 7 = 40∑B =

Step 2 -(∑A)(50)^{2 =}^{2 }= 2500(∑B)(40)^{2 =}^{2 }= 1600

Step 3 -∑A1^{2 =}^{2}+ 2^{2}+ 4^{2}+ 4^{2}+ 5^{2}+ 5^{2}+ 6^{2}+ 7^{2}+ 8^{2}+ 8^{2}= 300∑B1^{2 =}^{2}+ 2^{2}+ 2^{2}+ 3^{2}+ 3^{2}+ 4^{2}+ 5^{2}+ 6^{2}+ 7^{2}+ 7^{2}= 202

Step 4 -10n =μ50/10 = 5_{A = (∑A / n) = }μ40/10 = 4_{B = (∑B / n) = }

Step 5 -(10-1) + (10-1) = 18df = (nA - 1) + (nB - 1) =[using Eq-2]

Putting values found in aboveStep 6 -Independent sample t-test formula to

find the calculated value of t.

we get,t_{cal = 0.99}

Let value of α = 0.05 and df = 18. Looking up the two-tailed t-table.Step 7 -

we get,t_{table = 2.10}

(df)/(α) | 0.2 | 0.10 | 0.05 | . . |
---|---|---|---|---|

∞ | 1.282 | 1.645 | 1.960 | . . |

1 | 3.078 | 6.314 | 12.706 | . . |

2 | 1.886 | 2.920 | 4.303 | . . |

: | : | : | : | . . |

8 | 1.397 | 1.860 | 2.306 | . . |

9 | 1.383 | 1.833 | 2.262 | . . |

: | : | : | : | . . |

18 | 1.330 | 1.734 | 2.101 | . . |

19 | 1.328 | 1.729 | 2.093 | . . |

20 | 1.325 | 1.725 | 2.086 | . . |

: | : | : | : | . . |

Step 8 -

0.99 < 2.10(tby 1.11_{cal < ttable })=> no significant difference found between two groups.

**Code Implementations**

**Code Implementations**

## Python3

`# import the necessary libraries` `from` `scipy ` `import` `stats` `import` `numpy as np` `# Sample` `sample_A ` `=` `np.array([` `1` `,` `2` `,` `4` `,` `4` `,` `5` `,` `5` `,` `6` `,` `7` `,` `8` `,` `8` `])` `sample_B ` `=` `np.array([` `1` `,` `2` `,` `2` `,` `3` `,` `3` `,` `4` `,` `5` `,` `6` `,` `7` `,` `7` `])` `# Perform independent sample t-test` `t_statistic, p_value ` `=` `stats.ttest_ind(sample_A, sample_B)` `# Set the significance level (alpha)` `alpha ` `=` `0.05` `# Compute the degrees of freedom (df) (n_A-1)+(n_b-1)` `df ` `=` `len` `(sample_A)` `+` `len` `(sample_B)` `-` `2` `# Calculate the critical t-value` `# ppf is used to find the critical t-value for a two-tailed test` `critical_t ` `=` `stats.t.ppf(` `1` `-` `alpha` `/` `2` `, df)` `# Print the results` `print` `(` `"T-value:"` `, t_statistic)` `print` `(` `"P-Value:"` `, p_value)` `print` `(` `"Critical t-value:"` `, critical_t)` `# Decision` `print` `(` `'With T-value'` `)` `if` `np.` `abs` `(t_statistic) >critical_t:` ` ` `print` `(` `'There is significant difference between two groups'` `)` `else` `:` ` ` `print` `(` `'No significant difference found between two groups'` `)` `print` `(` `'With P-value'` `)` `if` `p_value >alpha:` ` ` `print` `(` `'No evidence to reject the null hypothesis that a significant difference between the two groups'` `)` `else` `:` ` ` `print` `(` `'Evidence found to reject the null hypothesis that a significant difference between the two groups'` `)` |

**Output:**

T-value: 0.9890707100936805

P-Value: 0.33573862223613105

Critical t-value: 2.10092204024096

With T-value

No significant difference found between two groups

With P-value

No evidence to reject the null hypothesis that a significant difference between the two groups

**2. Paired sample t-test**

**2. Paired sample t-test**

Paired sample t-test, commonly known as dependent sample t-test is used to find out if the difference in the mean of two samples is 0. The test is done on dependent samples, usually focusing on a particular group of people or things. In this, each entity is measured twice, resulting in a pair of observations.

**We can use this when:**

- Two similar (twin like) samples are given. [Eg, Scores obtained in English and Math (both subjects)]
- The dependent variable (data) is continuous.
- The observations are independent of one another.
- The dependent variable is approximately normally distributed.

**Formula Used**

where,t-valuet =difference between the two samples (A-B)D =sample size (same as n)N =

**Steps Involved**

Find the sum of difference of each two samples in data. [Step 1 -]∑D = ∑(A-B)Find the sum of square of each D found in Step 1. [Step 2 -(∑D]^{2)}Find the square of summation of D. [Step 3 -(∑D)]^{2}Put the values found from Steps 1-3 in above Paired sample t-test formula to andStep 4 -

find the t-value.Find the degree of freedomStep 5 -using df = n-1.(df)

Here, df is calculated as a whole for the data, not for each individual sample set. This is because the two samples A and B are twin like. (similar)NOTE :

So, df = ∑(n_{S}– 1) = N-1

Use the values ofStep 6 -anddf(take α = 0.05 if not given) in the above t-tableα

in two-tails to find the table value of t.Compare values of t found inStep 7 -andStep-4.Step-6

**Interpretation of Results **

Same as that of the Independent samples t-test.

**Example Problem (Step by Step)**

Consider the following example. Scores (out of 25) of the subjects Math1 and Math2 are taken for a sample of 10 students. We have to perform the paired sample t-test for this data.

Student no. | Math1 | Math2 | Step 1 | Step 2 |
---|---|---|---|---|

1 | 4 | 15 | -11 | 121 |

2 | 4 | 16 | -12 | 144 |

3 | 7 | 14 | -7 | 49 |

4 | 16 | 14 | 2 | 4 |

5 | 20 | 22 | -2 | 4 |

6 | 11 | 22 | -11 | 121 |

7 | 13 | 23 | -10 | 100 |

8 | 9 | 18 | -9 | 81 |

9 | 11 | 18 | -7 | 49 |

10 | 15 | 19 | -4 | 16 |

Sum – | (∑D) = -71 | ∑D^{2} = 689 |

as shown in table above.Step 1 and Step 2 -

Step 3 -(∑D)= (71)^{2}^{2}= 5041

Putting values in in above Paired sample t-test formula, we getStep 4 -

t_{cal = -4.96}_{ Here we, will consider the abosolute value so,}

tcal = 4.96

df = n -1 = 10 - 1 = 9Step 5 -

Using df = 9 and α = 0.05 in table. We get,Step 6 -

t_{table = 2.26}

4.96 > 2.26Step 7 -(tcal > ttable by 7.22)=> There is significant difference between math1 and math2

**Code Implementations**

**Code Implementations**

## Python3

`# import the necessary libraries` `from` `scipy ` `import` `stats` `import` `numpy as np` `# Create the paired samples` `math1 ` `=` `np.array([` `4` `, ` `4` `, ` `7` `, ` `16` `, ` `20` `, ` `11` `, ` `13` `, ` `9` `, ` `11` `, ` `15` `])` `math2 ` `=` `np.array([` `15` `, ` `16` `, ` `14` `, ` `14` `, ` `22` `, ` `22` `, ` `23` `, ` `18` `, ` `18` `, ` `19` `])` `# Perform the paired sample t-test` `t_statistic, p_value ` `=` `stats.ttest_rel(math1, math2)` `# Set the significance level (alpha)` `alpha ` `=` `0.05` `# Compute the degrees of freedom (df=n-1)` `df ` `=` `len` `(math2)` `-` `1` `# Calculate the critical t-value` `# ppf is used to find the critical t-value for a two-tailed test` `critical_t ` `=` `stats.t.ppf(` `1` `-` `alpha` `/` `2` `, df)` `# Print the results` `print` `(` `"T-value:"` `, t_statistic)` `print` `(` `"P-Value:"` `, p_value)` `print` `(` `"Critical t-value:"` `, critical_t)` `# Decision` `print` `(` `'With T-value'` `)` `if` `np.` `abs` `(t_statistic) >critical_t:` ` ` `print` `(` `'There is significant difference between math1 and math2'` `)` `else` `:` ` ` `print` `(` `'No significant difference found between math1 and math2'` `)` `print` `(` `'With P-value'` `)` `if` `p_value >alpha:` ` ` `print` `(` `'No evidence to reject the null hypothesis that significant difference between math1 and math2'` `)` `else` `:` ` ` `print` `(` `'Evidence found to reject the null hypothesis that significant difference between math1 and math2'` `)` |

** Output**:

T-value: -4.953488372093023

P-Value: 0.0007875235561560145

Critical t-value: 2.2621571627409915

With T-value

There is significant difference between math1 and math2

With P-value

Evidence found to reject the null hypothesis that significant difference between math1 and math2

**3. One sample t-test**

**3. One sample t-test**

One sample t-test is one of the widely used t-tests for comparison of the sample mean of the data to a particularly given value. Used for comparing the sample mean to the true/population mean.

**We can use this when:**

the sample size is small. (under 30) data is collected randomly. data is approximately normally distributed.

**Formula used:**

where,t-valuet =sample meanx_bar =true/population meanμ =standard deviationσ =sample sizen =

**Steps involved**

Define the nullStep 1 -(hand alternative_{0)}(hhypothesis._{1)}Calculate sample mean. (if not given)Step 2 -

[population mean, standard deviation, n is given]Put the values found inStep 3 -into above formula of One sample t-test and calculate t-value.Step 1(t_{cal)}Calculate degree of freedomStep 4 -.(df)(same as done in paired sample t-test)Take α = 0.05 if not given. Use the value of df and α and findStep 5 -t_{table}_{ }from above t-table

in one tailed.Compare values of t found inStep 6 -andStep-3.Step-5

**Interpretation of Results**

Same as that of the Independent samples t-test.

**Example Problem (Step by Step)**

**Example Problem (Step by Step)**

Consider the following example. The weights of 25 obese people were taken before enrolling them into the nutrition camp. The population mean weight is found to be 45 kg before starting the camp. After finishing the camp, for the same 25 people, the sample mean was found to be 75 with a standard deviation of 25. Did the fitness camp work?

Step 1 -(sample mean is true mean)h0 -> μ = 45

(sample mean is not true mean)h1 -> μ ≠ 45

Given,Step 2 -

75x_bar =

45μ =

25σ =

25n =

Putting the values fromStep 3 -in above formula of One sample t-test. we get,Step 2

t_{cal = 6}

df = n - 1 = 24Step 4 -

Using df = 24 and α = 0.05 in table. We get,Step 5 -

t_{table = 1.711}

6 > 1.711Step 6 -(tcal > ttable)=> significant difference found between two groups.=> the nutrition camp significantly impacted the weights and it was a success.

#### Code Implementations

## Python3

`import` `scipy.stats as stats` `import` `numpy as np` `# Define the population mean weight` `population_mean ` `=` `45` `# Define the sample mean weight and standard deviation` `sample_mean ` `=` `75` `sample_std ` `=` `25` `# Define the sample size` `sample_size ` `=` `25` `# Calculate the t-statistic` `t_statistic ` `=` `(sample_mean ` `-` `population_mean) ` `/` `(sample_std ` `/` `np.sqrt(sample_size))` `# Define the degrees of freedom` `df ` `=` `sample_size ` `-` `1` `# Set the significance level (alpha)` `alpha ` `=` `0.05` `# Calculate the critical t-value` `critical_t ` `=` `stats.t.ppf(` `1` `-` `alpha, df)` `# Calculate the p-value` `p_value ` `=` `1` `-` `stats.t.cdf(t_statistic, df)` `# Print the results` `print` `(` `"T-Statistic:"` `, t_statistic)` `print` `(` `"Critical t-value:"` `, critical_t)` `print` `(` `"P-Value:"` `, p_value)` `# Decision` `print` `(` `'With T-value :'` `)` `if` `t_statistic > critical_t:` ` ` `print` `(` `"""There is a significant difference in weight before and after the camp.` ` ` `The fitness camp had an effect."""` `)` `else` `:` ` ` `print` `(` `"""There is no significant difference in weight before and after the camp.` ` ` `The fitness camp did not have a significant effect."""` `)` `print` `(` `'With P-value :'` `)` `if` `p_value >alpha:` ` ` `print` `(` `"""There is a significant difference in weight before and after the camp.` ` ` `The fitness camp had an effect."""` `)` `else` `:` ` ` `print` `(` `"""There is no significant difference in weight before and after the camp.` ` ` `The fitness camp did not have a significant effect."""` `)` |

** Output**:

T-Statistic: 6.0

Critical t-value: 1.7108820799094275

P-Value: 1.703654035845048e-06

With T-value :

There is a significant difference in weight before and after the camp.

The fitness camp had an effect.

With P-value :

There is no significant difference in weight before and after the camp.

The fitness camp did not have a significant effect.

The above-discussed types of t-tests are widely used in the fields of research in hospitals by experts to gain important information about the medical data given to them about the effects of various medicines and drugs on the population and help them draw out important inferences regarding the same. However, it is the responsibility of the person to see to it that which t-test would bring out the best results and that all the assumptions of that t-test are adhered to. For any doubt/query, comment below.

## Please

Loginto comment...