Open In App

Kruskal-Wallis test in R Programming

Improve
Improve
Like Article
Like
Save
Share
Report

The Kruskal–Wallis test in R Programming Language is a rank-based test that is similar to the Mann–Whitney U test but can be applied to one-way data with more than two groups. It is a non-parametric alternative to the one-way ANOVA test, which extends the two-samples Wilcoxon test. A group of data samples is independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, it can be decided whether the population distributions are similar without assuming them to follow the normal distribution. It is very much easy to perform Kruskal-Wallis test in the R language.

Note: The outcome of the Kruskal–Wallis test tells that if there are differences among the groups, but doesn’t tell which groups are different from other groups. 

Examples:

  1. Let one wants to find out how socioeconomic status influences attitude towards sales tax hikes. Here the independent variable is “socioeconomic status” with three levels: working-class, middle-class, and wealthy. The dependent variable is measured on a 5-point Likert scale from strongly agree to strongly disagree.
  2. If one wants to find out how test anxiety influences actual test scores. The independent variable “test anxiety” has three levels: no anxiety, low-medium anxiety, and high anxiety. The dependent variable is the exam score and it is rated from 0 to 100%.

Assumptions for the Kruskal-Wallis test in R

The variables should have:

  • One independent variable with two or more levels. The test is more commonly used when there are three or more levels. For two levels instead of the Kruskal-Wallis test consider using the Mann Whitney U Test.
  • The dependent variable should be the Ordinal scale, Ratio Scale, or Interval scale.
  • The observations should be independent. In other words, there should be no correlation between the members in every group or within groups.
  • All groups should have identical shape distributions.

Implementation in R

R provides a method kruskal.test() which is available in the stats package to perform a Kruskal-Wallis rank-sum test.

Syntax: kruskal.test(x, g, formula, data, subset, na.action, …)

Parameters:

  • x: a numeric vector of data values, or a list of numeric data vectors.
  • g: a vector or factor object giving the group for the corresponding elements of x
  • formula: a formula of the form response ~ group where response gives the data values and group a vector or factor of the corresponding groups.
  • data: an optional matrix or data frame containing the variables in the formula .
  • subset: an optional vector specifying a subset of observations to be used.
  • na.action: a function which indicates what should happen when the data contain NA

…: further arguments to be passed to or from methods.

Example:

Let’s use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under control and two different treatment conditions. 

R




# Preparing the data set
# to perform Kruskal-Wallis Test
 
# Taking the PlantGrowth data set
myData = PlantGrowth
print(myData)
 
# Show the group levels
print(levels(myData$group))


 Output:

    weight group
1    4.17  ctrl
2    5.58  ctrl
3    5.18  ctrl
4    6.11  ctrl
5    4.50  ctrl
6    4.61  ctrl
7    5.17  ctrl
8    4.53  ctrl
9    5.33  ctrl
10   5.14  ctrl
11   4.81  trt1
12   4.17  trt1
13   4.41  trt1
14   3.59  trt1
15   5.87  trt1
16   3.83  trt1
17   6.03  trt1
18   4.89  trt1
19   4.32  trt1
20   4.69  trt1
21   6.31  trt2
22   5.12  trt2
23   5.54  trt2
24   5.50  trt2
25   5.37  trt2
26   5.29  trt2
27   4.92  trt2
28   6.15  trt2
29   5.80  trt2
30   5.26  trt2
[1] "ctrl" "trt1" "trt2"

Here the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically. The problem statement is we want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions. And the test can be performed using the function kruskal.test() as given below.

R




# R program to illustrate
# Kruskal-Wallis Test
 
# Taking the PlantGrowth data set
myData = PlantGrowth
 
# Performing Kruskal-Wallis test
result = kruskal.test(weight ~ group,
                    data = myData)
print(result)


 Output:

Kruskal-Wallis rank sum test

data:  weight by group

Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

Explanation:

As the p-value is less than the significance level 0.05, it can be concluded that there are significant differences between the treatment groups.



Last Updated : 16 May, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads