The correlation test measures the strength of the association between two variables. For instance, if we are interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. To know more about correlation please refer to this.
Methods for correlation analysis: There are mainly two types of correlation:
- Parametric Correlation – Pearson correlation(r): It measures a linear dependence between two variables (x and y) and is known as a parametric correlation test because it depends on the distribution of the data.
- Non-Parametric Correlation – Kendall(tau) and Spearman(rho): They are rank-based correlation coefficients, and are known as non-parametric correlation.
What is Spearman’s Correlation
Spearman’s Correlation is a statical measure of measuring the strength and direction of the monotonic relationship between two continuous variables. Therefore, these attributes are ranked or put in the order of their preference. It is denoted by the symbol “rho” (ρ) and can take values between -1 to +1. A positive value of rho indicates that there exists a positive relationship between the two variables, while a negative value of rho indicates a negative relationship. A rho value of 0 indicates no association between the two variables.
Spearman’s Correlation formula

where,
rs = Spearman Correlation coefficient
di = the difference in the ranks given to the two variables values for each item of the data
n = total number of observation
Spearman’s Correlation Example :
In Spearman’s rank correlation what we do is convert the data even if it is real value data to what we call ranks. Let’s consider taking 10 different data points in variables X1 and Y1. And find out their respective ranks. Then find out the square of the difference in the ranks given to the two variables values for each item of the data.
Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
X1 |
7 |
6 |
4 |
5 |
8 |
7 |
10 |
3 |
9 |
2 |
Y1 |
5 |
4 |
5 |
6 |
10 |
7 |
9 |
2 |
8 |
1 |
Rank X1 |
6.5 |
5 |
3 |
4 |
8 |
6.5 |
10 |
2 |
9 |
1 |
Rank Y1 |
4.5 |
3 |
4.5 |
6 |
10 |
7 |
9 |
2 |
8 |
1 |
d2 |
4 |
4 |
2.25 |
4 |
4 |
0.25 |
1 |
0 |
1 |
0 |
Step 1: Finding Rank
- Rank X1: So, what we have done is looked at all the individual values of X1 and assigned a rank to it. For example, the lowest value, in this case, is 2 and it is given a rank 1 the next highest value is 3 which is given a rank 2, and so on. So, we ranked all of these points. Notice that the sixth and the first value both are tied. So, they get the rank of 6.5(the midway the half of it) because there is a tie. Similarly, if there are more than 2 values that are tied we take all these ranks and average them by the number of data points that have equal values, and correspondingly you have to give the rank.
- Rank Y1: Similarly, you can give rank to Y1 data points in the same manner.
Step 2: Calculate d2
Once you have got the rank you compute the difference in the ranks. So, in this case, the difference in the rank for the first data point is 2 and we square it, similarly, we take the difference in the second data point in the ranks between Xi and Yi which is 2, and square it and we get 4. So, like this, we make the difference in the ranks, and by squaring it we get the final what we call the d-squared values. We sum all the values and then we compute the Spearman coefficient by using this value in the above formula.
By putting the value of the overall sum of d2 and n value
rho/rs = 1 - ((6 x 20.5) / 990)
= 1 - (123 / 990)
= 1 - 0.1242
= 0.88
Properties Of Spearman Correlation
- rs takes a value between -1(negative association) and 1(positive association).
- rs = 0 means no association.
- It can be used when the association is not linear.
- It can be applied to ordinal variables.
Spearman Correlation for Anscombe’s Data
Anscombe’s data also known as Anscombe’s quartet comprises of four datasets that have nearly identical simple statistical properties, yet appear very different when graphed. Each dataset consists of eleven (x, y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. Those 4 sets of 11 data points are given here. Please download the CSV file here. When we plot those points it looks like this. I am considering 3 sets of 11 data points here.
Python code for plotting the data
Python3
import pandas as pd
import matplotlib.pyplot as plt
fig, axs = plt.subplots( 2 , 2 )
axs[ 0 , 0 ].scatter(data[ 'x1' ], data[ 'y1' ])
axs[ 0 , 1 ].scatter(data[ 'x2' ], data[ 'y2' ])
axs[ 1 , 0 ].scatter(data[ 'x3' ], data[ 'y3' ])
axs[ 1 , 1 ].scatter(data[ 'x4' ], data[ 'y4' ])
axs[ 0 , 0 ].set_xlabel( 'x1' )
axs[ 0 , 0 ].set_ylabel( 'y1' )
axs[ 0 , 0 ].set_title( 'Scatter Plot of x1 vs y1' )
axs[ 0 , 1 ].set_xlabel( 'x2' )
axs[ 0 , 1 ].set_ylabel( 'y2' )
axs[ 0 , 1 ].set_title( 'Scatter Plot of x2 vs y2' )
axs[ 1 , 0 ].set_xlabel( 'x3' )
axs[ 1 , 0 ].set_ylabel( 'y3' )
axs[ 1 , 0 ].set_title( 'Scatter Plot of x3 vs y3' )
axs[ 1 , 1 ].set_xlabel( 'x4' )
axs[ 1 , 1 ].set_ylabel( 'y4' )
axs[ 1 , 1 ].set_title( 'Sc' )
|
A Brief Explanation of the Above DViolin Plot comparing ‘SepalLengthCm’ species-wise image widgetiagram
So, if we apply the Spearman correlation coefficient for each of these data sets we find that it is nearly identical, it does not matter whether you actually apply it to the first data set (top left) or second data set (top right) or the third data set (bottom left). So, what it seems to indicate is that if we apply the Spearman correlation and we find a reasonably high correlation coefficient close to one in this first data set(top left) case. The key point is here we can’t conclude immediately that if the Spearman correlation coefficient is going to be high then there is a linear relationship between them, for example in the second data set(top right) this is a non-linear relationship, and still gives rise to a reasonably high value.
Python Implementation of Spearman’s Rank Correlation
For implementing Spearman’s Rank correlation formula we will use the scipy library. It is one of the most used Python libraries for mathematical computation.
Python3
from scipy.stats import spearmanr
x = [ 1 , 2 , 3 , 4 , 5 ]
y = [ 5 , 4 , 3 , 2 , 1 ]
corr, pval = spearmanr(x, y)
print ( "Spearman's correlation coefficient:" , corr)
print ( "p-value:" , pval)
|
Output:
Spearman's correlation coefficient: -0.9999999999999999
p-value: 1.4042654220543672e-24
Advantages of Spearman’s Rank Correlation:
- This method is easier to understand.
- It is superior for calculating qualitative observations such as the intelligence of people, physical appearance, etc.
- This method is suitable when the series gives only the order of preference and not the actual value of the variable.
- It is robust to the outliers present in the data
- It is designed to capture monotonic relationships between variables. Monotonic relation measures the effect of change in one variable on another variable
Disadvantages of Spearman’s Rank Correlation:
- It is not applicable in the case of grouped data.
- It can handle only a limited number of observations or items.
- It Ignores Non-Monotonic Relationships between the variables for example it does not capture other types of relationships, such as curvilinear or nonlinear associations between the variables.
- It only considers the ranks of the data points and ignores the actual magnitude of differences between the values of the variables.
- Converting the data into ranks for Spearman’s rank correlation discards the original values of the variables and replaces them with their respective ranks. This transformation may result in a loss of information in the data, especially if the variables of the data have meaningful magnitudes or units.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
29 May, 2023
Like Article
Save Article