Open In App

Two-Sample t-test in R

Last Updated : 16 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In statistics, the two-sample t-test is like a measuring stick we use to see if two groups are different from each other. It helps us figure out if the difference we see is real or just random chance. In this article, we will calculate a Two-Sample t-test in the R Programming Language.

What is a Two-Sample t-test?

The two-sample t-test is a statistical method used to determine if there’s a significant difference between the means of two independent groups. It assesses whether the means of these groups are statistically different from each other or if any observed difference is due to random variation. For example, if we’re comparing test scores of two classes, we use this test to know if one class did better than the other by a meaningful amount, or if it’s just luck.

Before using a two-sample t-test, it’s important to make sure of the following:

  • The data in each group are separate and have similar distributions.
  • The populations from which the samples are taken follow a typical bell-shaped curve.
  • The variations within the populations are similar (equal variances).

Syntax:

t.test(x, y, alternative = “two.sided”, mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95)

  • x and y: These are the numeric vectors or data frames containing the two samples you want to compare.
  • alternative: This specifies the alternative hypothesis. It can take values “two.sided”, “less”, or “greater”, indicating whether you’re testing for a two-tailed, left-tailed, or right-tailed test, respectively.
  • mu: This is the hypothesized difference in means under the null hypothesis. The default is zero.
  • paired: If set to TRUE, it indicates that the two samples are paired (e.g., before and after measurements). The default is FALSE for unpaired samples.
  • var.equal: If set to TRUE, it assumes equal variances in the two groups. The default is FALSE.
  • conf.level: This specifies the confidence level for the confidence interval. The default is 0.95.

How to Perform Two-Sample t-test

Suppose we want to compare the heights of two groups of students, male and female, to see if there’s a significant difference in their average heights.

Step 1: Input Data

Let’s create two vectors representing the heights of male and female students:

R
# Heights of male students
heights_male <- c(170, 175, 180, 165, 172)

# Heights of female students
heights_female <- c(160, 165, 170, 155, 162)

Next we assume that the data within each group are independent, follow a normal distribution, and have equal variances. For simplicity, let’s assume these assumptions hold true.

Step 2: Conduct the t-test

Now, let’s perform the two-sample t-test using the t.test() function:

R
# Perform two-sample t-test
t_test_result <- t.test(heights_male, heights_female)

# View the t-test results
print(t_test_result)

Output:

    Welch Two Sample t-test

data: heights_male and heights_female
t = 2.8262, df = 8, p-value = 0.02228
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.840524 18.159476
sample estimates:
mean of x mean of y
172.4 162.4

This will provide output including the test statistic, degrees of freedom, p-value, and confidence interval.

Step 3: Interpretation

We primarily focus on the p-value, which indicates the probability of observing the data if the null hypothesis (no difference in means) is true. If the p-value is less than a chosen significance level (e.g., 0.05), we reject the null hypothesis and conclude that there is a significant difference in the average heights of male and female students.

R
# Check if p-value is less than 0.05
if (t_test_result$p.value < 0.05) {
  print("There is a significant difference in the average heights of male and female 
           students.")
} else {
  print("There is no significant difference in the average heights of male and female 
          students.")
}

Output:

[1] "There is a significant difference in the average heights of male and female students."

The p-value (0.02228) is less than the typical significance level of 0.05, indicating that there is a significant difference in the average heights of male and female students. Therefore, we reject the null hypothesis and conclude that there is a significant difference in the heights between male and female students.

Let’s perform a two-sample t-test to compare the test scores of two groups of students, Group A and Group B.

R
# Heights of Group A students
scores_groupA <- c(85, 90, 88, 82, 87)

# Heights of Group B students
scores_groupB <- c(78, 85, 80, 92, 79)

# Perform two-sample t-test
t_test_result <- t.test(scores_groupA, scores_groupB)

# View the t-test results
print(t_test_result)

Output:

    Welch Two Sample t-test

data: scores_groupA and scores_groupB
t = 1.2276, df = 6.0515, p-value = 0.2652
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.560982 10.760982
sample estimates:
mean of x mean of y
86.4 82.8

The t-value is 1.2276.

  • With Welch’s modification to the degrees of freedom, it’s approximately 6.0515.
  • p-value associated with the test is 0.2652.
  • The 95% confidence interval for the difference in means ranges from -3.560982 to 10.760982.
  • mean test score for Group A is 86.4, and the mean test score for Group B is 82.8.

Since the p-value (0.2652) is greater than the typical significance level of 0.05, we fail to reject the null hypothesis. This suggests that there is insufficient evidence to conclude that there is a significant difference in the test scores between Group A and Group B at the 0.05 significance level. The confidence interval indicates that the true difference in means could range from -3.560982 to 10.760982, including zero. Therefore, we cannot confidently say that the means are different.

Conclusion

The two-sample t-test is a handy way to compare averages between two groups. By following easy steps and understanding the results, we can tell if the differences we see are real or just random chance. This test helps us make sense of data and draw meaningful conclusions.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads