Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get correlation coefficient, one of the most popular is Pearson’s correlation (also known as Pearson’s R) which is commonly used for linear regression. The Pearson’s correlation coefficient is denoted with the symbol “R”. The correlation coefficient formula returns a value between 1 and -1. Here,
- 1 indicates strong positive relationships
- -1 indicates strong negative relationships
- And a result of zero indicates no relationship at all
Linear Correlation Coefficient Formula
The linear correlation coefficient is known as Pearson’s r or Pearson’s correlation coefficient. Which reflects the direction and strength of the linear relationship between the two variables x and y. It returns a value between -1 and +1. In this -1 indicates a strong negative correlation and +1 indicates a strong positive correlation. If it lies 0 then there is no correlation. This is also known as zero correlation.
The “crude estimates” for interpreting strengths of correlations using Pearson’s Correlation:
r value | crude estimates |
+.70 or higher | A very strong positive relationship |
+.40 to +.69 | Strong positive relationship |
+.30 to +.39 | Moderate positive relationship |
+.20 to +.29 | weak positive relationship |
+.01 to +.19 | No or negligible relationship |
0 | No relationship [zero correlation] |
-.01 to -.19 | No or negligible relationship |
-.20 to -.29 | weak negative relationship |
-.30 to -.39 | Moderate negative relationship |
-.40 to -.69 | Strong negative relationship |
-.70 or higher | The very strong negative relationship |
The formula used to get the linear correlation coefficient of the data is :
R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²
Explain the types of linear correlation coefficients?
The linear correlation coefficient is reflected by Pearson’s r. So, the value of r can be range between +1 and -1.
There are three types of linear correlation coefficient as follows:
Positive values indicate a Positive Correlation (0<r1)
Negative values indicate a Negative Correlation (-1r<1)
A Value of 0 indicates No Correlation (r=0)
Positive correlation: In positive correlation both the variables move in the same direction. If one increases the other also increases and if one decreases the other also decreases. Whenever the r indicates a positive value it shows a positive relationship
Negative correlation: In negative correlation both the variables move in different directions. If one increases the other decreases and if one decreases the other increases. Whenever the r indicates a negative value it shows a negative relationship
No correlation: when there is no statistical association between the variables. They are said to have no correlation. In this case, their correlation coefficient (also known as r) is 0.
Sample Problems
Problem 1: Calculate the correlation coefficient for the following data:
X = 5, 9,14, 16
and
Y = 6, 10, 16, 20
Solution:
Given variables are,
X = 12,16 ,4, 8
and
Y = 15, 20, 55, 10
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X | Y | XY | X² | Y² |
5 | 6 | 180 | 144 | 225 |
9 | 10 | 320 | 256 | 400 |
14 | 16 | 20 | 16 | 20 |
16 | 20 | 80 | 56 | 100 |
∑40 | ∑50 | ∑600 | ∑480 | ∑750 |
∑xy = 600
∑x = 40
∑y = 50
∑x² = 470
∑y² = 750
n = 4
Put all the values in the Pearson’s correlation coefficient formula:-
R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²
R = 4(600) – (40)(50) / √[4(470)-(40)²][4(750)-(50)²]
R = 400 / √[320][500]
R = 400/400
R =1
It shows that the relationship between the variables of the data is a very strong positive relationship.
Problem 2: Find the value of the correlation coefficient from the following table:
SUBJECT | AGE X | GLUCOSE LEVEL Y |
1 | 42 | 98 |
2 | 23 | 68 |
3 | 22 | 73 |
4 | 47 | 79 |
5 | 50 | 88 |
6 | 60 | 82 |
Solution:
Make a table from the given data and add three more columns of XY, X², and Y² also add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x², and ∑y² and n =6.
SUBJECT | AGE X | GLUCOSE LEVEL Y | XY | X² | Y² |
1 | 42 | 98 | 4116 | 1764 | 9604 |
2 | 23 | 68 | 1564 | 529 | 4624 |
3 | 22 | 73 | 1606 | 484 | 5329 |
4 | 47 | 79 | 3713 | 2209 | 6241 |
5 | 50 | 88 | 4400 | 2500 | 7744 |
6 | 60 | 82 | 4980 | 3600 | 6724 |
∑ | 244 | 488 | 20379 | 11086 | 40266 |
∑xy= 20379
∑x=244
∑y=488
∑x² =11086
∑y² =40266
n =6.
Put all the values in the Pearson’s correlation coefficient formula:-
R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R = 6(20379) – (244)(488) / √ [6(11086)-(244)²][6(40266)-(488)²
R = 3202 / √ [6980][3452]
R = 3202/4972.238
R = 0.6439
It shows that the relationship between the variables of the data is a strong positive relationship.
Problem 3: Calculate the correlation coefficient for the following data:
X = 21,31,25,40,47,38
and
Y = 70,55,60,78,66,80
Solution:
Given variables are,
X = 21,31,25,40,47,38
and
Y = 70,55,60,78,66,80
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X | Y | XY | X² | Y² |
21 | 70 | 1470 | 441 | 4900 |
31 | 55 | 1705 | 961 | 3025 |
25 | 60 | 1400 | 625 | 3600 |
40 | 78 | 3120 | 1600 | 6084 |
47 | 66 | 3102 | 2209 | 4356 |
38 | 80 | 3040 | 1444 | 6400 |
∑202 | ∑409 | ∑13937 | ∑7280 | ∑28265 |
∑xy= 13937
∑x=202
∑y=409
∑x² =7280
∑y² =28265
n =6
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 6(13937) – (202)(409) / √ [6(7280)-(202)²][6(28265)-(409)²]
R= 1004 / √[2876][2909]
R=1004 / 2892.452938
R=-0.3471
It shows that the relationship between the variables of the data is a moderate positive relationship.
Problem 4: Calculate the correlation coefficient for the following data:
X= 12, 10, 42, 27,35,56
and
Y = 13, 15, 56, 34,65,26
Solution:
Given variables are,
X= 12, 10, 42, 27,35,56
and
Y = 13, 15, 56, 34,65,26
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula
X | Y | XY | X² | Y² |
12 | 13 | 156 | 144 | 169 |
10 | 15 | 150 | 100 | 225 |
42 | 56 | 2353 | 1764 | 3136 |
27 | 34 | 918 | 729 | 1156 |
35 | 65 | 2275 | 1225 | 4225 |
56 | 26 | 1456 | 3136 | 676 |
∑182 | ∑209 | ∑7307 | ∑7098 | ∑9587 |
∑xy= 7307
∑x=182
∑y=209
∑x² =7098
∑y² =9587
n =6
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 6(7307) – (182)(209) / √ [6(7098)-(182)²][6(9587)-(209)²]
R= 5804 / √[9464][13841]
R= 5804/11445.139
R=0.5071
It shows that the relationship between the variables of the data is a strong positive relationship.
Problem 5: There is some correlation coefficient that was given to tell whether the variables are positive or negative?
0.69
0.42
-0.23
-0.99
Solution:
The given correlation coefficient is as follows:
0.64
0.46
-0.29
-0.95
Tell whether the relationship is negative or positive
0.64
The relationship between the variables is a strong positive relationship
0.46
The relationship between the variables is a strong positive relationship
-0.29
The relationship between the variables is a weak negative relationship
-0.95
The relationship between the variables is a very strong negative relationship.
Problem 6: Calculate the correlation coefficient for the following data:
X = 10, 13, 15 ,17 ,19
and
Y = 5,10,15,20,25.
Solution:
Given variables are,
X = 10, 13, 15 ,17 ,19
and
Y = 5,10,15,20,25.
To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.
X | Y | XY | X² | Y² |
10 | 5 | 50 | 100 | 25 |
13 | 10 | 130 | 169 | 100 |
15 | 15 | 225 | 225 | 225 |
17 | 20 | 340 | 289 | 400 |
19 | 25 | 475 | 361 | 625 |
∑74 | ∑75 | ∑1103 | ∑1144 | ∑1375 |
∑xy= 1103
∑x=74
∑y=75
∑x² =1144
∑y² =1375
n =5
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 5(1103) – (74)(75) / √ [5(1144)-(74)²][5(1375)-(75)²]
R= -35 / √[244][1250]
R= -35/552.26
R=0.0633
It shows that the relationship between the variables of the data is a negligible relationship.
Problem 7: Find the value of the correlation coefficient from the following table:
SUBJECT | AGE X | Weight Y |
1 | 40 | 99 |
2 | 25 | 79 |
3 | 22 | 69 |
4 | 54 | 89 |
Solution:
SUBJECT | AGE X | Weight Y | XY | X² | Y² |
1 | 40 | 99 | 3960 | 1600 | 9801 |
2 | 25 | 79 | 1975 | 625 | 6241 |
3 | 22 | 69 | 1518 | 484 | 4761 |
4 | 54 | 89 | 4806 | 2916 | 7921 |
∑ | 151 | 336 | 12259 | 5625 | 28724 |
∑xy= 12258
∑x=151
∑y=336
∑x² =5625
∑y² 28724
n =4
Put all the values in the Pearson’s correlation coefficient formula:-
R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²
R= 4(12258) – (151)(336) / √ [4(5625)-(151)²][4(28724)-(336)²]
R= -1704 / √ [-301][-2000]
R=-1704/775.886
R=-2.1961
It shows that the relationship between the variables of the data is a very strong negative relationship.