# Permutation Hypothesis Test in R Programming

In simple words, the permutation hypothesis test in R is a way of comparing a numerical value of 2 groups. The permutation Hypothesis test is an alternative to:

Let’s implement this test in R programming.

### Why use the Permutation Hypothesis Test?

• Small Sample Size.
• Assumptions(for parametric approach) not met.
• Test something other than classic approaches comparing Means and Medians.
• Difficult to estimate the SE for test-statistic.

### Permutation Hypothesis Test Steps

1. Specify a hypothesis
2. Choose test-stat(Eg: Mean, Median, etc. )
3. Determine Distribution of test-stat
4. Convert test-stat to P-value

Note:
P-value = No. of permutations having a test-stat value greater than observed test-stat value/ No. of permutations.

### Implementation in R

• Dataset: Chicken Diet Data. This dataset is a subset of the “chickwts” data in the “R dataset package”. Download the data set here.
• Hypothesis: The weight of the chicken is independent of the type of diet.

Test-Statistics

• Test-Statistics #1: The absolute value of the difference in mean weights for the two diets | Y1 – Y2 |. This is the same test statistics as the independent two-sided two-sample t-test.
• Test-Statistics #2: The absolute value of the difference in median weights for the two diets | Median1 – Median2 |

## R

 `# R program to illustrate ` `# Permutation Hypothesis Test ` ` `  `# load the data set ` `d <- ``read.table``(file = ``"ChickData.csv"``,  ` `                ``header = T, sep = ``","``) ` ` `  `# print the dataset ` `print``(d) ` ` `  `# check the names ` `names``(d) ` `levels``(d\$feed) ` ` `  `# how many observations in each diet? ` `table``(d\$feed) ` ` `  `# let's look at a boxplot of weight gain by those 2 diets ` `boxplot``(d\$weight~d\$feed, las = 1,  ` `        ``ylab = ``"weight (g)"``,  ` `        ``xlab = ``"feed"``, ` `        ``main = ``"Weight by Feed"``) ` ` `  `# calculate the difference in sample MEANS ` `mean``(d\$weight[d\$feed == ``"casein"``]) ``# mean for casein ` `mean``(d\$weight[d\$feed == ``"meatmeal"``]) ``# mean for meatmeal ` ` `  `# lets calculate the absolute diff in means ` `test.stat1 <- ``abs``(``mean``(d\$weight[d\$feed == ``"casein"``]) -  ` `                  ``mean``(d\$weight[d\$feed == ``"meatmeal"``]))  ` `test.stat1 ` ` `  ` `  `# calculate the difference in sample MEDIANS ` `median``(d\$weight[d\$feed == ``"casein"``]) ``# median for casein ` `median``(d\$weight[d\$feed == ``"meatmeal"``]) ``# median for meatmeal ` ` `  `# lets calculate the absolute diff in medians ` `test.stat2 <- ``abs``(``median``(d\$weight[d\$feed == ``"casein"``]) -  ` `                  ``median``(d\$weight[d\$feed == ``"meatmeal"``]))   ` `test.stat2 ` ` `  `# Permutation Test ` ` `  `# for reproducability of results ` `set.seed``(1979)   ` ` `  `# the number of observations to sample ` `n <- ``length``(d\$feed)   ` ` `  `# the number of permutation samples to take ` `P <- 100000  ` ` `  `# the variable we will resample from  ` `variable <- d\$weight   ` ` `  `# initialize a matrix to store the permutation data ` `PermSamples <- ``matrix``(0, nrow = n, ncol = P) ` ` `  `# each column is a permutation sample of data ` `# now, get those permutation samples, using a loop ` `# let's take a moment to discuss what that code is doing ` `for``(i ``in` `1:P) ` `  ``{ ` `    ``PermSamples[, i] <- ``sample``(variable,  ` `                               ``size = n,  ` `                               ``replace = ``FALSE``) ` `  ``} ` ` `  `# we can take a quick look at the first 5 columns of PermSamples ` `PermSamples[, 1:5] ` ` `  `# initialize vectors to store all of the Test-stats ` `Perm.test.stat1 <- Perm.test.stat2 <- ``rep``(0, P) ` ` `  `# loop thru, and calculate the test-stats ` `for ``(i ``in` `1:P) ` `  ``{ ` `    ``# calculate the perm-test-stat1 and save it ` `    ``Perm.test.stat1[i] <- ``abs``(``mean``(PermSamples[d\$feed == ``"casein"``,i]) -  ` `                              ``mean``(PermSamples[d\$feed == ``"meatmeal"``,i])) ` `     `  `    ``# calculate the perm-test-stat2 and save it ` `    ``Perm.test.stat2[i] <- ``abs``(``median``(PermSamples[d\$feed == ``"casein"``,i]) -  ` `                              ``median``(PermSamples[d\$feed == ``"meatmeal"``,i])) ` `  ``} ` ` `  `# before going too far with this,  ` `# let's remind ourselves of  ` `# the TEST STATS ` `test.stat1; test.stat2 ` ` `  `# and, take a look at the first 15  ` `# permutation-TEST STATS for 1 and 2 ` `round``(Perm.test.stat1[1:15], 1) ` `round``(Perm.test.stat2[1:15], 1) ` ` `  `# and, let's calculate the permutation p-value ` `# notice how we can ask R a true/false question ` `(Perm.test.stat1 >= test.stat1)[1:15] ` ` `  `# and if we ask for the mean of all of those, ` `# it treats 0 = FALSE, 1 = TRUE ` `mean``((Perm.test.stat1 >= test.stat1)[1:15]) ` ` `  `# Calculate the p-value, for all P = 100,000 ` `mean``(Perm.test.stat1 >= test.stat1) ` ` `  `# and, let's calculate the p-value for  ` `# option 2 of the test statistic (abs diff in medians) ` `mean``(Perm.test.stat2 >= test.stat2)`

Output:

```> print(d)
weight  feed
1     325 meatmeal
2     257 meatmeal
3     303 meatmeal
4     315 meatmeal
5     380 meatmeal
6     153 meatmeal
7     263 meatmeal
8     242 meatmeal
9     206 meatmeal
10    344 meatmeal
11    258 meatmeal
12    368   casein
13    390   casein
14    379   casein
15    260   casein
16    404   casein
17    318   casein
18    352   casein
19    359   casein
20    216   casein
21    222   casein
22    283   casein
23    332   casein
> names(d)
[1] "weight" "feed"
> levels(d\$feed)
[1] "casein"   "meatmeal"
> table(d\$feed)
casein meatmeal
12       11 ```

```> mean(d\$weight[d\$feed == "casein"]) # mean for casein
[1] 323.5833
> mean(d\$weight[d\$feed == "meatmeal"]) # mean for meatmeal
[1] 276.9091
> test.stat1
[1] 46.67424
> median(d\$weight[d\$feed == "casein"]) # median for casein
[1] 342
> median(d\$weight[d\$feed == "meatmeal"]) # median for meatmeal
[1] 263
> test.stat2
[1] 79
> PermSamples[, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,]  379  283  380  352  206
[2,]  380  303  258  260  380
[3,]  257  206  379  380  153
[4,]  283  242  222  404  359
[5,]  222  260  325  258  258
[6,]  315  352  153  379  263
[7,]  352  263  263  325  325
[8,]  153  325  315  359  216
[9,]  368  379  344  242  260
[10,]  344  258  368  368  257
[11,]  359  257  206  257  315
[12,]  206  153  404  222  303
[13,]  404  344  303  390  390
[14,]  325  318  318  303  352
[15,]  242  404  332  263  404
[16,]  390  380  257  206  379
[17,]  260  332  216  315  318
[18,]  303  359  352  344  368
[19,]  263  222  242  283  222
[20,]  332  368  260  332  344
[21,]  318  315  283  318  283
[22,]  216  390  390  153  332
[23,]  258  216  359  216  242
> test.stat1; test.stat2
[1] 46.67424
[1] 79
> round(Perm.test.stat1[1:15], 1)
[1] 17.1 32.4 17.6 47.1 56.1 28.9 31.0 40.8  6.8 13.8  9.1 46.5 28.9 50.9 32.7
> round(Perm.test.stat2[1:15], 1)
[1] 61.0 75.0  4.5 59.0 78.0 17.0 62.0 38.5  4.5 16.0 23.0 60.5 63.5 75.0 37.0
> (Perm.test.stat1 >= test.stat1)[1:15]
[1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
> mean((Perm.test.stat1 >= test.stat1)[1:15])
[1] 0.2
> mean(Perm.test.stat1 >= test.stat1)
[1] 0.09959
> mean(Perm.test.stat2 >= test.stat2)
[1] 0.05407```

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!