Skip to content
Related Articles

Related Articles

Improve Article

Kendall Correlation Testing in R Programming

  • Last Updated : 28 Jul, 2020

Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally it lies between -1 and +1. It is scaled version of covariance and provides direction and strength of relationship.It’s dimensionless. There are mainly two types of correlation:

  • Parametric CorrelationPearson correlation(r) : It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
  • Non-Parametric Correlation – Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, are known as non-parametric correlation.

Kendall Rank Correlation Coefficient Formula

Kendall Rank Correlation is rank-based correlation coefficients, is also known as non-parametric correlation. The formula for calculating Kendall Rank Correlation is as follows:

{{\displaystyle \tau = \frac {Number\hspace{1 mm}of\hspace{1 mm}concordant\hspace{1 mm}pairs - Number\hspace{1 mm}of\hspace{1 mm}discordant\hspace{1 mm}pairs }{n(n - 1) / 2}

where,

  • Concordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
    • x1 > x2 and y1 > y2 or
    • x1 < x2 and y1 < y2
  • Discordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
    • x1 > x2 and y1 < y2 or
    • x1 < x2 and y1 > y2
  • n: Total number of samples

Note: The pair for which x1 = x2 and y1 = y2 are not classified as concordant or discordant are ignored.

Implementation in R

R Language provides two methods to calculate the correlation coefficient. By using the functions cor() or cor.test() it can be calculated. It can be noted that cor() computes the correlation coefficient whereas cor.test() computes test for association or correlation between paired samples. It returns both the correlation coefficient and the significance level(or p-value) of the correlation.



Syntax:
cor(x, y, method = “kendall”)
cor.test(x, y, method = “kendall”)

Parameters:
x, y: numeric vectors with the same length
method: correlation method

Example 1:

# Using cor() method
Example:




# R program to illustrate 
# Kendall Correlation Testing 
# Using cor() 
  
# Taking two numeric 
# Vectors with same length 
x = c(1, 2, 3, 4, 5, 6, 7)  
y = c(1, 3, 6, 2, 7, 4, 5
  
# Calculating  
# Correlation coefficient 
# Using cor() method 
result = cor(x, y, method = "kendall"
  
# Print the result 
cat("Kendall correlation coefficient is:", result) 

Output:

Kendall correlation coefficient is: 0.4285714

# Using cor.test() method
Example:




# R program to illustrate 
# Kendall Correlation Testing 
# Using cor.test() 
  
# Taking two numeric 
# Vectors with same length 
x = c(1, 2, 3, 4, 5, 6, 7)  
y = c(1, 3, 6, 2, 7, 4, 5
  
# Calculating  
# Correlation coefficient 
# Using cor.test() method 
result = cor.test(x, y, method = "kendall"
  
# Print the result 
print(result) 

Output:

Kendall's rank correlation tau

data:  x and y
T = 15, p-value = 0.2389
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.4285714 

In the output above:

  • T is the value of the test statistic (T = 15)
  • p-value is the significance level of the test statistic (p-value = 0.2389).
  • alternative hypothesis is a character string describing the alternative hypothesis (true tau is not equal to 0).
  • sample estimates is the correlation coefficient. For Kendall correlation coefficient it’s named as tau (Cor.coeff = 0.4285).

Example 2:

Data: Download the CSV file here.
Example:




# R program to illustrate 
# Kendall Correlation Testing 
  
# Import data into RStudio 
df = read.csv("Auto.csv"
  
# Taking two column 
# Vectors with same length 
x = df$mpg 
y = df$weight 
  
  
# Calculating 
# Correlation coefficient 
# Using cor() method 
result = cor(x, y, method = "kendall"
  
# Print the result 
cat("Kendall correlation coefficient is:", result) 
  
# Using cor.test() method 
res = cor.test(x, y, method = "kendall"
print(res) 

Output:

Kendall correlation coefficient is: -0.7517463
    Kendall's rank correlation tau

data:  x and y
z = -19.161, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
       tau 
-0.7517463 



My Personal Notes arrow_drop_up
Recommended Articles
Page :