Independent Sample t Test in R

Statistical hypothesis testing is a fundamental tool in data analysis to draw meaningful conclusions from data. The independent sample t-test is one of the widely used statistical tests that compares the mean (average) of two independent groups and determines whether there is a significant difference between them or not.

Table of Content

T Test
Understanding the Independent Sample t-Test
Independent Sample T-test on Scores of Student Groups
Independent T-test on mtcars Dataset

T Test

The t-test is a statistical hypothesis test that compares the means of two groups and determines whether there is a statistically significant difference between them or not. It determines whether an observed difference in the mean is likely due to real differences between the study populations or simply the result of random sampling variation. T-tests are widely used in research and data analysis. For comparing the two groups, there are several types of t-tests, including the Student t-test, paired sample t-test, and Welch’s t-test.

Student t-test:

Student t-test compares the means of two independent groups to determine if there is a significant difference between them. For example, if a teacher wants to see if first-year students scored differently on an exam than final-year students

Paired t-test:

Paired t-test compares the means of two related or paired groups, such as before-and-after measurements on the same subjects. For example, Comparing the mean scores of the same student group before and after the syllabus change

Welch t-test:

Welch’s t-test compares the means of two independent groups but it does not assume equal variances. Welch’s t-test is more appropriate when the variance of the two groups compared are significantly different

Some real world examples are:

Business: A company can use an independent sample t-test to evaluate the effectiveness of two different business strategies by comparing the mean (average) returns of both strategies.
Medical Research: To determine whether a new medicine is more effective than an existing medicine, researchers can use an independent t-test to compare the mean recovery times of two groups of patients, one treated with the new medicine and one treated with the existing one.
Education: A teacher can use independent sample t-test to compare the mean test scores of students who studied the new syllabus and those taught using the old syllabus

Note: T-tests are recommended while comparing the means of two groups to check if there is a significant difference between them or not. They are particularly useful in situations where the sample size is small (less than 30), when the data for each group follows an approximately normal distribution, and when one specifically wants to test hypotheses about mean differences.

In this article, we will explore the theoretical foundations of independent sample t-test and its practical implementation using R.

Understanding the Independent Sample t-Test

The independent sample t-test is applicable when you have two distinct and independent groups and you want to determine whether there is evidence to suggest that the means of these two groups are significantly different. It’s a parametric test that assumes the data in each group follows a normal distribution and that the variances in the two groups are approximately equal.

Assumptions:

Independence: The results of each group are independent of each other.
Normality: The data for each group is approximately normally distributed.
Homogeneity of variance: The variances in both groups are approximately equal.

Hypothesis:

Null Hypothesis (H0): There is no significant difference between the means of the two groups.
Alternative hypothesis (H1): There is a significant difference between the mean values of the two groups.

Test statistics:

The t-test statistic follows a t-distribution with degrees of freedom equal to the sum of the degrees of freedom for the two groups. This value indicates the difference between group means. A larger absolute t-statistic indicates a more significant difference.

P-value:

The p-value represents the probability of the observation and the extreme t-statistic, as well as the probability calculated from the sample data, assuming that the null hypothesis is true. A small p-value (less than 0.05) indicates that there is a significant difference between the mean of two groups and the null hypothesis is rejected.

Pre-Requisites

Before moving forward make sure you have ‘stats’ package installed to perform T-test in R

install.packages('stats')

Independent Sample T-test on Scores of Student Groups

Let’s go through the steps to perform an independent sample t-test in R using a simple data which contains scores of two independent student groups

library(stats)
 
# Create sample data for Group A and Group B - scores of two student groups
# Create sample data for Group A and Group B - scores of two student groups

group_a <- c(95, 91, 88, 82, 93, 94, 89, 79, 87, 70)

group_b <- c(87, 84, 99, 95, 91, 87, 82, 80, 92, 76)
 
# Perform the independent sample t-test

t_test_result <- t.test(group_a, group_b)
 
# Print the t-test result

print(t_test_result)

Output:

Welch Two Sample t-test
data:  group_a and group_b
t = -0.15002, df = 17.837, p-value = 0.8824
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -7.506606  6.506606
sample estimates:
mean of x mean of y 
     86.8      87.3

Interpretations:

The t-statistic is approximately -0.1500 and the degrees of freedom are approximately 17.87.
The 95% confidence interval for the difference in means is (-7.50, 6.50).
The mean of Group A is approximately 86.8 and the mean of Group B is approximately 87.3.
The p-value is approximately 0.882 and it is greater than 0.05. Hence we could not reject the null hypothesis. This shows that there is no significant difference in the mean of Group A and Group B.

Independent T-test on mtcars Dataset

Let us perform another independent sample t-test in R using the built-in “mtcars” dataset. In this example, we’ll compare the miles per gallon (mpg) of automatic and manual transmission cars to determine if there is a significant difference in fuel efficiency.

# Load the mtcars dataset 

data(mtcars)
 
# Subset the data into two groups: automatic and manual transmission cars

automatic <- mtcars[mtcars$am == 0, "mpg"]

manual <- mtcars[mtcars$am == 1, "mpg"]
 
# Perform the independent sample t-test

t_test_result <- t.test(automatic, manual)
 
# Print the t-test result

print(t_test_result)

Output:

Welch Two Sample t-test
data:  automatic and manual
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean of x mean of y 
 17.14737  24.39231

Interpretations:

The t-statistic is approximately -3.767 and the degrees of freedom are approximately 18,332.
The 95% confidence interval for the mean difference is (-11.280, -3.209).
The average mpg for automatic transmission cars (Group 1) is approximately 17.1474 and the average mpg for manual transmission cars (Group 2) is approximately 24.392.
The p-value is approximately 0.00137 and it is less than 0.05. Hence we reject the null hypothesis and conclude that there is a significant difference in fuel economy (mpg) between automatic and manual cars in the ‘mtcars’ dataset.

Conclusion

The independent samples t-test is a powerful tool for comparing two groups and determining the difference between their means. When testing a hypothesis in R, be sure to check the assumptions, run the test and interpret the results.

Article Tags :

Geeks Premier League

R Language

Geeks Premier League 2023

R Statistics-Function