Open In App

Hypothesis Testing in R Programming

A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypothesis Testing in R Programming is a process of testing the hypothesis made by the researcher or to validate the hypothesis. To perform hypothesis testing, a random sample of data from the population is taken and testing is performed. Based on the results of the testing, the hypothesis is either selected or rejected. This concept is known as Statistical Inference. In this article, we’ll discuss the four-step process of hypothesis testing, One sample T-Testing, Two-sample T-Testing, Directional Hypothesis, one sample -test, two samples -test and correlation test in R programming.

Four Step Process of Hypothesis Testing

There are 4 major steps in hypothesis testing:



One Sample T-Testing

One sample T-Testing approach collects a huge amount of data and tests it on random samples. To perform T-Test in R, normally distributed data is required. This test is used to test the mean of the sample with the population. For example, the height of persons living in an area is different or identical to other persons living in other areas.

Syntax: t.test(x, mu) Parameters: x: represents numeric vector of data mu: represents true value of the mean



To know about more optional parameters of t.test(), try the below command:

help("t.test")

Example: 

# Defining sample vector
x <- rnorm(100)
 
# One Sample T-Test
t.test(x, mu = 5)

                    

Output:

    One Sample t-test

data:  x
t = -49.504, df = 99, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 5
95 percent confidence interval:
 -0.1910645  0.2090349
sample estimates:
  mean of x 
0.008985172 

Two Sample T-Testing

In two sample T-Testing, the sample vectors are compared. If var. equal = TRUE, the test assumes that the variances of both the samples are equal.

Syntax: t.test(x, y) Parameters: x and y: Numeric vectors

Example: 

# Defining sample vector
x <- rnorm(100)
y <- rnorm(100)
 
# Two Sample T-Test
t.test(x, y)

                    

Output:

        Welch Two Sample t-test

data:  x and y
t = -1.0601, df = 197.86, p-value = 0.2904
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.4362140  0.1311918
sample estimates:
  mean of x   mean of y 
-0.05075633  0.10175478 

Directional Hypothesis

Using the directional hypothesis, the direction of the hypothesis can be specified like, if the user wants to know the sample mean is lower or greater than another mean sample of the data.

Syntax: t.test(x, mu, alternative) Parameters: x: represents numeric vector data mu: represents mean against which sample data has to be tested alternative: sets the alternative hypothesis

Example: 

# Defining sample vector
x <- rnorm(100)
 
# Directional hypothesis testing
t.test(x, mu = 2, alternative = 'greater')

                    

Output:

        One Sample t-test

data:  x
t = -20.708, df = 99, p-value = 1
alternative hypothesis: true mean is greater than 2
95 percent confidence interval:
 -0.2307534        Inf
sample estimates:
 mean of x 
-0.0651628 

One Sample -Test

This type of test is used when comparison has to be computed on one sample and the data is non-parametric. It is performed using wilcox.test() function in R programming.

Syntax: wilcox.test(x, y, exact = NULL) Parameters: x and y: represents numeric vector exact: represents logical value which indicates whether p-value be computed

To know about more optional parameters of wilcox.test(), use below command:

help("wilcox.test")

Example: 

# Define vector
x <- rnorm(100)
 
# one sample test
wilcox.test(x, exact = FALSE)

                    

Output:

        Wilcoxon signed rank test with continuity correction

data:  x
V = 2555, p-value = 0.9192
alternative hypothesis: true location is not equal to 0

Two Sample -Test

This test is performed to compare two samples of data. Example: 

# Define vectors
x <- rnorm(100)
y <- rnorm(100)
 
# Two sample test
wilcox.test(x, y)

                    

Output:

        Wilcoxon rank sum test with continuity correction

data:  x and y
W = 5300, p-value = 0.4643
alternative hypothesis: true location shift is not equal to 0

Correlation Test

This test is used to compare the correlation of the two vectors provided in the function call or to test for the association between the paired samples.

Syntax: cor.test(x, y) Parameters: x and y: represents numeric data vectors

To know about more optional parameters in cor.test() function, use below command:

help("cor.test")

Example: 

# Using mtcars dataset in R
cor.test(mtcars$mpg, mtcars$hp)

                    

Output:

        Pearson's product-moment correlation

data:  mtcars$mpg and mtcars$hp
t = -6.7424, df = 30, p-value = 1.788e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8852686 -0.5860994
sample estimates:
       cor 
-0.7761684

Article Tags :