Open In App

Kendall Correlation Testing in R Programming

Last Updated : 29 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generally it lies between -1 and +1. It is scaled version of covariance and provides direction and strength of relationship.It’s dimensionless. There are mainly two types of correlation:

  • Parametric CorrelationPearson correlation(r) : It measures a linear dependence between two variables (x and y) is known as a parametric correlation test because it depends on the distribution of the data.
  • Non-Parametric Correlation – Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, are known as non-parametric correlation.

Kendall Rank Correlation Coefficient Formula

Kendall Rank Correlation is rank-based correlation coefficients, is also known as non-parametric correlation. The formula for calculating Kendall Rank Correlation is as follows:

{{\displaystyle \tau = \frac {Number\hspace{1 mm}of\hspace{1 mm}concordant\hspace{1 mm}pairs - Number\hspace{1 mm}of\hspace{1 mm}discordant\hspace{1 mm}pairs }{n(n - 1) / 2}

where, 

  • Concordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
    • x1 > x2 and y1 > y2 or
    • x1 < x2 and y1 < y2
  • Discordant Pair: A pair of observations (x1, y1) and (x2, y2) that follows the property
    • x1 > x2 and y1 < y2 or
    • x1 < x2 and y1 > y2
  • n: Total number of samples

Note: The pair for which x1 = x2 and y1 = y2 are not classified as concordant or discordant are ignored.

Implementation in R

R Language provides two methods to calculate the correlation coefficient. By using the functions cor() or cor.test() it can be calculated. It can be noted that cor() computes the correlation coefficient whereas cor.test() computes test for association or correlation between paired samples. It returns both the correlation coefficient and the significance level(or p-value) of the correlation.

Syntax: cor(x, y, method = “kendall”) cor.test(x, y, method = “kendall”) 

Parameters: x, y: numeric vectors with the same length 

method: correlation method

Example 1: # Using cor() method Example: 

R

# R program to illustrate
# Kendall Correlation Testing
# Using cor()
 
# Taking two numeric
# Vectors with same length
x = c(1, 2, 3, 4, 5, 6, 7) 
y = c(1, 3, 6, 2, 7, 4, 5)
 
# Calculating 
# Correlation coefficient
# Using cor() method
result = cor(x, y, method = "kendall")
 
# Print the result
cat("Kendall correlation coefficient is:", result)

                    

Output:

Kendall correlation coefficient is: 0.4285714

# Using cor.test() method Example: 

R

# R program to illustrate
# Kendall Correlation Testing
# Using cor.test()
 
# Taking two numeric
# Vectors with same length
x = c(1, 2, 3, 4, 5, 6, 7) 
y = c(1, 3, 6, 2, 7, 4, 5)
 
# Calculating 
# Correlation coefficient
# Using cor.test() method
result = cor.test(x, y, method = "kendall")
 
# Print the result
print(result)

                    

Output:

Kendall's rank correlation tau

data:  x and y
T = 15, p-value = 0.2389
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.4285714 

In the output above:

  • T is the value of the test statistic (T = 15)
  • p-value is the significance level of the test statistic (p-value = 0.2389).
  • alternative hypothesis is a character string describing the alternative hypothesis (true tau is not equal to 0).
  • sample estimates is the correlation coefficient. For Kendall correlation coefficient it’s named as tau (Cor.coeff = 0.4285).

Example 2: Data: Download the CSV file here

Example: 

R

# R program to illustrate
# Kendall Correlation Testing
 
# Import data into RStudio
df = read.csv("Auto.csv")
 
# Taking two column
# Vectors with same length
x = df$mpg
y = df$weight
 
 
# Calculating
# Correlation coefficient
# Using cor() method
result = cor(x, y, method = "kendall")
 
# Print the result
cat("Kendall correlation coefficient is:", result)
 
# Using cor.test() method
res = cor.test(x, y, method = "kendall")
print(res)

                    

Output:

Kendall correlation coefficient is: -0.7517463
    Kendall's rank correlation tau

data:  x and y
z = -19.161, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
       tau 
-0.7517463 


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads