Kendall Correlation Testing in R Programming
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally it lies between -1 and +1. It is scaled version of covariance and provides direction and strength of relationship.It’s dimensionless. There are mainly two types of correlation:
- Parametric Correlation – Pearson correlation(r) : It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
- Non-Parametric Correlation – Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, are known as non-parametric correlation.
Kendall Rank Correlation Coefficient Formula
Kendall Rank Correlation is rank-based correlation coefficients, is also known as non-parametric correlation. The formula for calculating Kendall Rank Correlation is as follows:
- Concordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
- x1 > x2 and y1 > y2 or
- x1 < x2 and y1 < y2
- Discordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
- x1 > x2 and y1 < y2 or
- x1 < x2 and y1 > y2
- n: Total number of samples
Note: The pair for which x1 = x2 and y1 = y2 are not classified as concordant or discordant are ignored.
Implementation in R
R Language provides two methods to calculate the correlation coefficient. By using the functions cor() or
cor.test() it can be calculated. It can be noted that
cor() computes the correlation coefficient whereas
cor.test() computes test for association or correlation between paired samples. It returns both the correlation coefficient and the significance level(or p-value) of the correlation.
cor(x, y, method = “kendall”)
cor.test(x, y, method = “kendall”)
x, y: numeric vectors with the same length
method: correlation method
Kendall correlation coefficient is: 0.4285714
Kendall's rank correlation tau data: x and y T = 15, p-value = 0.2389 alternative hypothesis: true tau is not equal to 0 sample estimates: tau 0.4285714
In the output above:
- T is the value of the test statistic (T = 15)
- p-value is the significance level of the test statistic (p-value = 0.2389).
- alternative hypothesis is a character string describing the alternative hypothesis (true tau is not equal to 0).
- sample estimates is the correlation coefficient. For Kendall correlation coefficient it’s named as tau (Cor.coeff = 0.4285).
Data: Download the CSV file here.
Kendall correlation coefficient is: -0.7517463 Kendall's rank correlation tau data: x and y z = -19.161, p-value < 2.2e-16 alternative hypothesis: true tau is not equal to 0 sample estimates: tau -0.7517463