# Mann Whitney U Test in R Programming

A popular nonparametric(distribution-free) test to compare outcomes between two independent groups is the **Mann Whitney U test**. When comparing two independent samples, when the outcome is not normally distributed and the samples are small, a nonparametric test is appropriate. It is used to see the distribution difference between two independent variables on the basis of an ordinal(categorical variable having intrinsic an order or rank) dependent variable. It’s very much easy to perform this test in R programming.

### Implementation of Mann Whitney U Test in R Programming

Let’s say we have two kinds of bulbs say orange and red in our data and these are divided on the day to day base prices. So here the base prices are dependent variable on the two categories which are red and orange. So we will try and analyze that if we want to buy a red or orange color bulb which should we prefer on the basis of prices. If both the distributions are the same then this means that the **null hypothesis**** (**means no significant difference between the two**) **is true and we can buy any one of them and prices won’t matter. To understand the concept of the Mann Whitney U Test one needs to know what is the **p-value**. This value actually tells if we can reject our null hypothesis(0.5) or not. Now below is the implementation of the above example.

**Approach**

- After this, check the summary of the non-ordinal categorical variable by loading a package
**dplyr****summarise()****median()****red and orange bulb**. - Then look at the Boxplot and see the distribution of the data with the help of installing a package
**ggpubr**and using the**ggboxplot()****palette****and****passing**the color codes. - Then finally apply the function
**wilcox.test()****.** - If the p-value is found to be less than 0.5 then the
**null hypothesis**will be rejected. - If we found the value to be greater than 0.5 then the
**null hypothesis**will be accepted. **wilcox.test()****.**

## R

`# R program to illustrate` `# Mann Whitney U Test` `# Creating a small dataset` `# Creating a vector of red bulb and orange prices` `red_bulb <- ` `c` `(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8)` `orange_bulb <- ` `c` `(47.8, 60, 63.4, 76, 89.4, 67.3, 61.3, 62.4)` `# Passing them in the columns` `BULB_PRICE = ` `c` `(red_bulb, orange_bulb)` `BULB_TYPE = ` `rep` `(` `c` `(` `"red"` `, ` `"orange"` `), each = 8)` `# Now creating a dataframe` `DATASET <- ` `data.frame` `(BULB_TYPE, BULB_PRICE, stringsAsFactors = ` `TRUE` `)` `# printing the dataframe` `DATASET` `# installing libraries to view summaries and` `# boxplot of both orange and red color bulbs` `install.packages` `(` `"dplyr"` `)` `install.packages` `(` `"ggpubr"` `)` `# Summary of the data` `# loading the package` `library` `(dplyr)` `group_by` `(DATASET,BULB_TYPE) %>%` ` ` `summarise` `(` ` ` `count = ` `n` `(),` ` ` `median = ` `median` `(BULB_PRICE, na.rm = ` `TRUE` `),` ` ` `IQR = ` `IQR` `(BULB_PRICE, na.rm = ` `TRUE` `))` `# loading package for boxplot` `library` `(` `"ggpubr"` `)` `ggboxplot` `(DATASET, x = ` `"BULB_TYPE"` `, y = ` `"BULB_PRICE"` `,` ` ` `color = ` `"BULB_TYPE"` `, palette = ` `c` `(` `"#FFA500"` `, ` `"#FF0000"` `),` ` ` `ylab = ` `"BULB_PRICES"` `, xlab = ` `"BULB_TYPES"` `)` `res <- ` `wilcox.test` `(BULB_PRICE~ BULB_TYPE,` ` ` `data = DATASET,` ` ` `exact = ` `FALSE` `)` `res` |

**Output:**

**> DATASET**

BULB_TYPE BULB_PRICE 1 red 38.9 2 red 61.2 3 red 73.3 4 red 21.8 5 red 63.4 6 red 64.6 7 red 48.4 8 red 48.8 9 orange 47.8 10 orange 60.0 11 orange 63.4 12 orange 76.0 13 orange 89.4 14 orange 67.3 15 orange 61.3 16 orange 62.4

**# summary of the data**

summarise()` ungrouping output (override with `.groups` argument) # A tibble: 2 x 4 BULB_TYPE count median IQR <fct> <int> <dbl> <dbl> 1 orange 8 62.9 8.5 2 red 8 55 17.7

**# boxplot**

**> res**

Wilcoxon rank sum test with continuity correction data: BULB_PRICE by BULB_TYPE W = 44.5, p-value = 0.2072 alternative hypothesis: true location shift is not equal to 0

**Explanation:**

Here as we can see that the value of **p **is coming out to be **0.2072 ** which is far less than the **null hypothesis(0.5)**. Due to which it will be rejected. And it can conclude that the distribution of prices over red and orange bulbs is not the same. Due to which it cannot say that if it is profitable to buy any one of the above bulbs is profitable.