Open In App

Spearman’s Rank Correlation Coefficient

Improve
Improve
Like Article
Like
Save
Share
Report

Spearman’s Rank Correlation Coefficient or Spearman’s Rank Difference Method or Formula is a method of calculating the correlation coefficient of qualitative variables and was developed in 1904 by Charles Edward Spearman. In other words, the formula determines the correlation coefficient of variables like beauty, ability, honesty, etc., whose quantitative measurement is not possible. Therefore, these attributes are ranked or put in the order of their preference. 

r_k = 1 - \frac{6\sum{D^2}}{N^3 - N}

In the given formula,

rk = Coefficient of rank correlation

D = Rank differences

N = Number of variables

Three different cases of Spearman’s Rank Correlation Coefficient:

Case 1: When Ranks are given

In this case, the ranks of the frequency distribution or variables are already given, and the coefficient of rank correlation is calculated based on those ranks. The formula for calculating Spearman’s Rank Correlation is

r_k = 1 - \frac{6\sum{D^2}}{N^3 - N}

Example:

In an art competition, two judges accorded following ranks to the 10 participants:

Judge X12345678910
Judge Y62971483105

Calculate coefficient of rank correlation. 

Solution:

Judge X (R1)Judge Y (R2)D = R1 – R2D2
16-525
2200
39-636
47-39
51416
6424
78-11
83525
910-11
105525
N = 10  âˆ‘D2  = 142

r_k = 1 - \frac{6\sum{D^2}}{N^3 - N}

= 1 - \frac{6\times{142}}{10^3 - 10}

= 1 - \frac{852}{990}

= 1 – 0.860

= 0.14

Coefficient of Correlation (rk) = 0.14

As the rank correlation is positive and closer to 0, it means that the association between the ranks of the two judges is weaker. 

Case 2: When Ranks are not given

When the ranks of the variables or distribution are not given, then the individual has to rank the values themselves. While ranking the values, one has to adopt a uniform procedure for both series of distribution. For instance, if 1st rank is given to the lowest value of one series, then the same pattern should be followed for the second series as well. Once the rank has been determined, the coefficient of rank correlation is determined as the first case. The formula for calculating Spearman’s rank correlation coefficient is

r_k = 1 - \frac{6\sum{D^2}}{N^3 - N}

Example:

Calculate the Spearman’s Rank Correlation for the following data.

Mathematics14151712161118910
Accountancy41281025937

Solution:

In the given case, there are 9 values, and the ranking for both X and Y or Mathematics and Accountancy is done by giving the highest rank to the highest value and the lowest rank to the lowest value. Therefore, 1st rank is given to 9 in the X series and 2 in the Y series. Similarly, the 9th rank is given to 18 in the X series and 12 in the Y series. 

Mathematics (X)Rank R1Accountancy (Y)Rank R2D = R1 – R2D2
1454324
156129-39
1788624
124108-416
16721636
11354-11
1899724
9132-11
10275-39
N = 9    âˆ‘D2 = 84

r_k = 1 - \frac{6\sum{D^2}}{N^3 - N}

= 1 - \frac{6\times84}{9^3 - 9}

= 1 - \frac{504}{720}

= 1 – 0.7

= 0.3

Coefficient of Correlation (rk) = 0.3

It means that there is a positive rank correlation of a moderate degree of 0.3.

Case 3: When Ranks are equal

When two or more values of a series have an equal rank, then in such cases, each value is given the average of the two ranks. To avoid any mistake, the formula for calculating Spearman’s Rank Correlation Coefficient is

r_k = 1 - \frac{6[\sum D^2 + \frac{1}{12}(m_1^3 - m_1) + \frac{1}{12}(m_2^3 - m_2) + ...]}{N^3 - N}

Here, m1, m2, ……. are the number of times a value has repeated in the given X, Y, …….. series, respectively. 

Example:

Calculate the coefficient of rank correlation of the scores obtained by 7 students in an essay writing competition by two judges, X and Y.

X15122016182026
Y10151111251830

Solution:

In the given case, there are 7 values or students, and ranks have been given as highest rank to the highest score and lowest rank to the lowest score. For instance, for scores given by Judge X, 1st rank is given to the score of 26 and for the scores given by Judge Y, 1st rank is given to the score of 30. 

X            Rank R1          Y              Rank R2         D              D2   
156107-11
12715439
202.5115.5-39
165115.5-0.50.25
18425224
202.5183-0.50.25
26130100
     âˆ‘D2 = 23.5

Judge X has given 20 scores to two students who are in the place of 2nd and 3rd rank. Therefore, the average of both ranks, i.e., (2+3)/2 = 2.5 rank has been given to both students. 

Similarly, Judge Y has given 11 scores to two students who are in the place of 5th and 6th rank. Therefore, the average of both ranks, i.e., (5+6)/2 = 5.5 has been given to both students. 

Also, in series X, the number 20 is repeated twice, and in Y series, the number 11 is repeated twice. Therefore, m for series X or m1 is 2 and m for series Y or m2 is 2. 

r_k = 1 - \frac{6[\sum D^2 + \frac{1}{12}(m_1^3 - m_1) + \frac{1}{12}(m_2^3 - m_2) + ...]}{N^3 - N}

= 1 - \frac{6[23.5 + \frac{1}{12}(2^3 - 2) + \frac{1}{12}(2^3 - 2)]}{7^3 - 7}

= 1 - \frac{6[23.5 + \frac{1}{2} + \frac{1}{2}]}{336}

= 1 - 6[\frac{24.5}{336}]

= 1 - \frac{147}{336}

= 1 – 0.4375

= 0.5625

Coefficient of Correlation = 0.56

The positive correlation coefficient of 0.56 means that around 25% of the variation is related.

 



Last Updated : 06 Apr, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads