# Program for Spearman’s Rank Correlation

Prerequisite : Correlation Coefficinet

Given two arrays X[] and Y[]. Find Spearman’s Rank Correlation. In Spearman rank correlation instead of working with the data values themselves (as discussed in Correlation coefficient), it work with the ranks of these values. The observations are first ranked and then these ranks are used in correlation. The Algorithm for this correlation is as follows

Rank each observation in X and store it in Rank_X Rank each observation in Y and store it in Rank_Y Obtain Pearson Correlation Coefficient for Rank_X and Rank_Y

The formula used to calculate Pearson’s Correlation Coefficient (r or rho) of sets X and Y is as follows:

Algorithm for calculating Pearson’s Coefficient of Sets X and Y

function correlationCoefficient(X, Y) n = X.size sigma_x = sigma_y = sigma_xy = 0 sigma_xsq = sigma_ysq = 0 for i in 0...N-1 sigma_x = sigma_x + X[i] sigma_y = sigma_y + Y[i] sigma_xy = sigma_xy + X[i] * Y[i] sigma_xsq = sigma_xsq + X[i] * X[i] sigma_ysq = sigma_ysq + Y[i] * Y[i] num =( n * sigma_xy - sigma_x * sigma_y) den = sqrt( [n*sigma_xsq - (sigma_x)^ 2]*[ n*sigma_ysq - (sigma_y) ^ 2] ) return num/den

While assigning ranks, it may encounter ties i.e two or more observations having the same rank. To resolve ties, this will use fractional ranking scheme. In this scheme, if n observations have the same rank then each observation gets a fractional rank given by:

fractional_rank = (rank) + (n-1)/2

The next rank that gets assigned is rank + n and not rank + 1. For instance, if the 3 items have same rank r, then each gets fractional_rank as given above. The next rank that can be given to another observation is r + 3. Note that fractional ranks need not be fractions. They are the arithmetic mean of n consecutive ranks ex r, r + 1, r + 2 … r + n-1.

(r + r+1 + r+2 + ... + r+n-1) / n = r + (n-1)/2

Some Examples :

Input : X = [15 18 19 20 21] Y = [25 26 28 27 29] Solution : Rank_X = [1 2 3 4 5] Rank_Y = [1 2 4 3 5 ] sigma_x = 1+2+3+4+5 = 15 sigma_y = 1+2+4+3+5 = 15 sigma_xy = 1*2+2*2+3*4+4*3+5*5 = 54 sigma_xsq = 1*1+2*2+3*3+4*4+5*5 = 55 sigma_ysq = 1*1+2*2+3*3+4*4+5*5 = 55 Substitute values in formula Coefficient = Pearson(Rank_X, Rank_Y) = 0.9 Input: X = [15 18 21 15 21 ] Y = [25 25 27 27 27 ] Solution: Rank_X = [1.5 3 4.5 1.5 4.5] Rank_Y = [1.5 1.5 4 4 4] Calculate and substitute values of sigma_x, sigma_y, sigma_xy, sigma_xsq, sigma_ysq. Coefficient = Pearson(Rank_X, Rank_Y) = 0.456435

The Algorithm for fractional ranking scheme is given below

function rankify(X) N = X.size() // Vector to store ranks Rank_X(N) for i = 0 ... N-1 r = 1 and s = 1 // Count no of smaller elements in 0...i-1 for j = 0...i-1 if X[j] < X[i] r = r+1 if X[j] == X[i] s = s+1 // Count no of smaller elements in i+1...N-1 for j = i+1...N-1 if X[j] < X[i] r = r+1 if X[j] == X[i] s = s+1 //Assign Fractional Rank Rank_X[i] = r + (s-1) * 0.5 return Rank_X

Note:

There is a direct formula to calculate Spearman’s coefficient given by However we need to put in a correction term to resolve each tie and hence this formula has not been discussed. Calculating Spearman’s coefficient from the correlation coefficient of ranks is the most general method.

A CPP Program to evaluate Spearman’s coefficient is given below

## C++

`// Program to find correlation ` `// coefficient ` `#include <iostream> ` `#include <vector> ` `#include <cmath> ` `using` `namespace` `std; ` ` ` `typedef` `vector<` `float` `> Vector; ` ` ` `// Utility Function to print ` `// a Vector ` `void` `printVector(` `const` `Vector &X) ` `{ ` ` ` `for` `(` `auto` `i: X) ` ` ` `cout << i << ` `" "` `; ` ` ` ` ` `cout << endl; ` `} ` ` ` `// Function returns the rank vector ` `// of the set of observations ` `Vector rankify(Vector & X) { ` ` ` ` ` `int` `N = X.size(); ` ` ` ` ` `// Rank Vector ` ` ` `Vector Rank_X(N); ` ` ` ` ` `for` `(` `int` `i = 0; i < N; i++) ` ` ` `{ ` ` ` `int` `r = 1, s = 1; ` ` ` ` ` `// Count no of smaller elements ` ` ` `// in 0 to i-1 ` ` ` `for` `(` `int` `j = 0; j < i; j++) { ` ` ` `if` `(X[j] < X[i] ) r++; ` ` ` `if` `(X[j] == X[i] ) s++; ` ` ` `} ` ` ` ` ` `// Count no of smaller elements ` ` ` `// in i+1 to N-1 ` ` ` `for` `(` `int` `j = i+1; j < N; j++) { ` ` ` `if` `(X[j] < X[i] ) r++; ` ` ` `if` `(X[j] == X[i] ) s++; ` ` ` `} ` ` ` ` ` `// Use Fractional Rank formula ` ` ` `// fractional_rank = r + (n-1)/2 ` ` ` `Rank_X[i] = r + (s-1) * 0.5; ` ` ` `} ` ` ` ` ` `// Return Rank Vector ` ` ` `return` `Rank_X; ` `} ` ` ` `// function that returns ` `// Pearson correlation coefficient. ` `float` `correlationCoefficient ` ` ` `(Vector &X, Vector &Y) ` `{ ` ` ` `int` `n = X.size(); ` ` ` `float` `sum_X = 0, sum_Y = 0, ` ` ` `sum_XY = 0; ` ` ` `float` `squareSum_X = 0, ` ` ` `squareSum_Y = 0; ` ` ` ` ` `for` `(` `int` `i = 0; i < n; i++) ` ` ` `{ ` ` ` `// sum of elements of array X. ` ` ` `sum_X = sum_X + X[i]; ` ` ` ` ` `// sum of elements of array Y. ` ` ` `sum_Y = sum_Y + Y[i]; ` ` ` ` ` `// sum of X[i] * Y[i]. ` ` ` `sum_XY = sum_XY + X[i] * Y[i]; ` ` ` ` ` `// sum of square of array elements. ` ` ` `squareSum_X = squareSum_X + ` ` ` `X[i] * X[i]; ` ` ` `squareSum_Y = squareSum_Y + ` ` ` `Y[i] * Y[i]; ` ` ` `} ` ` ` ` ` `// use formula for calculating ` ` ` `// correlation coefficient. ` ` ` `float` `corr = (` `float` `)(n * sum_XY - ` ` ` `sum_X * sum_Y) / ` ` ` `sqrt` `((n * squareSum_X - ` ` ` `sum_X * sum_X) * ` ` ` `(n * squareSum_Y - ` ` ` `sum_Y * sum_Y)); ` ` ` ` ` `return` `corr; ` `} ` ` ` `// Driver function ` `int` `main() ` `{ ` ` ` ` ` `Vector X = {15,18,21, 15, 21}; ` ` ` `Vector Y= {25,25,27,27,27}; ` ` ` ` ` `// Get ranks of vector X ` ` ` `Vector rank_x = rankify(X); ` ` ` ` ` `// Get ranks of vector y ` ` ` `Vector rank_y = rankify(Y); ` ` ` ` ` `cout << ` `"Vector X"` `<< endl; ` ` ` `printVector(X); ` ` ` ` ` `// Print rank vector of X ` ` ` `cout << ` `"Rankings of X"` `<< endl; ` ` ` `printVector(rank_x); ` ` ` ` ` `// Print Vector Y ` ` ` `cout << ` `"Vector Y"` `<< endl; ` ` ` `printVector(Y); ` ` ` ` ` `// Print rank vector of Y ` ` ` `cout << ` `"Rankings of Y"` `<< endl; ` ` ` `printVector(rank_y); ` ` ` ` ` `// Print Spearmans coefficient ` ` ` `cout << ` `"Spearman's Rank correlation: "` ` ` `<< endl; ` ` ` `cout<<correlationCoefficient(rank_x, ` ` ` `rank_y); ` ` ` ` ` `return` `0; ` `} ` |

*chevron_right*

*filter_none*

Output:

Vector X 15 18 21 15 21 Rankings of X 1.5 3 4.5 1.5 4.5 Vector Y 25 25 27 27 27 Rankings of Y 1.5 1.5 4 4 4 Spearman's Rank correlation: 0.456435

References

https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

## Recommended Posts:

- Program to find correlation coefficient
- Program for Rank of Matrix
- Rank of all elements in an array
- Program to calculate the value of sin(x) and cos(x)
- Program to add two polynomials
- Program for sum of cos(x) series
- Program to find sum of 1 + x/2! + x^2/3! +...+x^n/(n+1)!
- Program to add two fractions
- Program for n-th odd number
- Program for n-th even number
- C program to calculate the value of nPr
- Program to compare m^n and n^m
- Program to calculate value of nCr
- Program for sum of arithmetic series
- Program for Derivative of a Polynomial

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.