Significance Test for Kendall’s Tau-b in R

Kendall’s Tau-b is a non-parametric correlation coefficient used to measure the strength and direction of association between two variables. The significance test for Kendall’s Tau-b in R is a statistical procedure that allows researchers to determine whether the correlation coefficient is statistically significant or not. In this article, we will explore the importance of Kendall’s Tau-b and demonstrate how to perform a significance test for Kendall’s Tau-b in R Programming Language.

Importance of Kendall’s Tau-b

Kendall’s Tau-b is a useful measure of correlation because it does not assume any particular distribution for the variables being analyzed. Unlike Pearson’s correlation coefficient, Kendall’s Tau-b can be used with both continuous and ordinal variables. This makes it an ideal tool for researchers who are working with non-parametric data or data that does not meet the assumptions of other statistical tests.

Kendall’s Tau-b is also useful because it is less sensitive to outliers than other correlation coefficients. Outliers can have a significant impact on correlation coefficients, which can result in misleading results. However, because Kendall’s Tau-b is based on rankings rather than absolute values, it is less affected by outliers.

Mathematical Concept

The formula for Kendall’s Tau-b coefficient is:

Tau_b = (n_c – n_d) / sqrt((n_t – n₁) * (n_t – n₂))

where n_c is the number of concordant pairs, n_d is the number of discordant pairs, and n_t is the total number of pairs.

A pair (i, j) is concordant if the ranks of both i and j relative to another variable are either both increasing or decreasing. A pair (i, j) is discordant if the ranks of i and j are in opposite order relative to another variable. A tie in one variable breaks ties in the other variable.

The interpretation of Kendall’s Tau-b coefficient is similar to that of other correlation coefficients, with values ranging from -1 to 1. A value of -1 indicates a perfect negative association, 0 indicates no association, and 1 indicates a perfect positive association.

Kendall’s Tau-b coefficient has some advantages over other correlation coefficients. It is more robust to outliers and non-normal distributions, and it can handle ties in the data. However, it can be less efficient than other correlation coefficients when dealing with continuous data.

Performing a Significance Test for Kendall’s Tau-b in R

Before performing the test, it is important to check the assumptions of the test. The assumptions of the Tau-b test are:

The data should be ordinal or rank-ordered.
The data should be independent.
There should be no ties in the data.

To perform a significance test for Kendall’s Tau-b in R, we can use the ‘cor.test‘ function. The ‘cor.test’ function allows us to test the null hypothesis that the correlation coefficient is equal to zero. If the p-value is less than our chosen significance level (usually 0.05), we can reject the null hypothesis and conclude that there is a significant correlation between the variables.

Suppose we have a dataset called “height.csv” which contains the heights and weights of a group of individuals. We want to test whether there is a significant correlation between the height and weight of the individuals.

Import the Data

First, we need to import the data into R. We can use the read.csv() function to read the dataset into R. You can download the dataset link from here.

height_data <- read.csv("HeightWeight.csv") 

head(height_data, 10)

Output:

Performing the Test

Next, we can perform the Tau-b test using the cor.test() function in R. We need to set the method argument to “kendall”. Here’s the code:

cor.test(height_data$Height, 

         height_data$Weight,  

         method = "kendall")

Output:

    Kendall's rank correlation tau

data:  height_data$Height and height_data$Weight
z = 79.741, p-value < 2.2e-16
alternative hypothesis: true tau is not equal to 0
sample estimates:
      tau 
0.3362424

The output shows that the Tau-b statistic is 0.334 with a p-value of less than 2.2e-16, which is significant at the 5% level. The alternative hypothesis is that the true Tau-b value is not equal to 0, indicating that there is a significant correlation between height and weight.

Interpreting the Results

Kendall’s rank correlation tau: This is the value of Kendall’s tau correlation coefficient between the height and weight variables in the dataset. The value is 0.336, indicating a positive correlation between height and weight.
z-value: This is the test statistic used to determine the p-value of the test. The z-value is calculated as (tau – 0) / sqrt((2*(2n+5))/(9n(n-1))), where n is the sample size. In this case, the z-value is 79.741.
p-value: This is the probability of observing a correlation as extreme or more extreme than the one observed in the sample, assuming that the null hypothesis is true. The null hypothesis is that there is no correlation between height and weight. The p-value is less than 2.2e-16, which means that the correlation observed in the sample is statistically significant at a very high level of confidence (less than 0.0001%). Therefore, we reject the null hypothesis and conclude that there is a statistically significant correlation between height and weight.
Alternative hypothesis: This is the hypothesis that we are testing against the null hypothesis. In this case, the alternative hypothesis is that there is a non-zero correlation between height and weight.
Sample estimates: This is the estimate of the correlation coefficient in the sample. The value of tau is 0.336, which confirms that there is a positive correlation between height and weight in the sample.

Overall, the output suggests that there is a statistically significant positive correlation between height and weight in the sample, with Kendall’s tau correlation coefficient of 0.336.

Article Tags :

R Language