Open In App

Linear Correlation Coefficient Formula

Improve
Improve
Like Article
Like
Save
Share
Report

Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get correlation coefficient, one of the most popular is Pearson’s correlation (also known as Pearson’s R) which is commonly used for linear regression. The Pearson’s correlation coefficient is denoted with the symbol “R”. The correlation coefficient formula returns a value between 1 and -1. Here,

  • 1 indicates strong positive relationships
  • -1 indicates strong negative relationships
  • And a result of zero indicates no relationship at all

Linear Correlation Coefficient Formula

The linear correlation coefficient is known as Pearson’s r or Pearson’s correlation coefficient. Which reflects the direction and strength of the linear relationship between the two variables x and y. It returns a value between -1 and +1. In this -1 indicates a strong negative correlation and +1 indicates a strong positive correlation. If it lies 0 then there is no correlation. This is also known as zero correlation.

The “crude estimates” for interpreting strengths of correlations using Pearson’s Correlation:

r value crude estimates
+.70 or higher  A very strong positive relationship
+.40 to +.69  Strong positive relationship
+.30 to +.39 Moderate positive relationship
+.20 to +.29 weak positive relationship
+.01 to +.19 No or negligible relationship
0 No relationship [zero correlation]
-.01 to -.19 No or negligible relationship
-.20 to -.29 weak negative relationship
-.30 to -.39 Moderate negative relationship
-.40 to -.69  Strong negative relationship
-.70 or higher  The very strong negative relationship

The formula used to get the linear correlation coefficient of the data is :

R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²

Explain the types of linear correlation coefficients?

The linear correlation coefficient is reflected by Pearson’s r. So, the value of r can be range between +1 and -1.

There are three types of linear correlation coefficient as follows:

Positive values indicate a Positive Correlation (0<r1)

Negative values indicate a Negative Correlation (-1r<1)

A Value of 0 indicates No Correlation (r=0)

Positive correlation: In positive correlation both the variables move in the same direction. If one increases the other also increases and if one decreases the other also decreases. Whenever the r indicates a positive value it shows a positive relationship

Negative correlation: In negative correlation both the variables move in different directions. If one increases the other decreases and if one decreases the other increases. Whenever the r indicates a negative value it shows a negative relationship

No correlation: when there is no statistical association between the variables. They are said to have no correlation. In this case, their correlation coefficient (also known as r) is 0.

Sample Problems

Problem 1: Calculate the correlation coefficient for the following data:

X = 5, 9,14, 16

and

Y = 6, 10, 16, 20

Solution:

Given variables are,

X = 12,16 ,4, 8

and

Y = 15, 20, 55, 10

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula

X Y XY
5 6 180 144 225
9 10 320 256 400
14 16 20 16 20
16 20 80 56 100
∑40 ∑50 ∑600 ∑480 ∑750

∑xy = 600

∑x = 40

∑y = 50

∑x² = 470

∑y² = 750

n = 4

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)² 

R = 4(600) – (40)(50) / √[4(470)-(40)²][4(750)-(50)²]

R = 400 / √[320][500]

R = 400/400

R =1

It shows that the relationship between the variables of the data is a very strong positive relationship.

Problem 2: Find the value of the correlation coefficient from the following table:

SUBJECT AGE X GLUCOSE LEVEL Y
1 42 98
2 23 68
3 22 73
4 47 79
5 50 88
6 60 82

Solution:

Make a table from the given data and add three more columns of XY, X², and Y² also add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x², and ∑y² and n =6.

SUBJECT AGE X

GLUCOSE

 LEVEL Y

XY  Y²
1 42 98 4116 1764 9604
2 23 68 1564 529 4624
3 22 73 1606 484 5329
4 47 79 3713 2209 6241
5 50 88 4400 2500 7744
6 60 82 4980 3600 6724
244 488 20379 11086 40266

∑xy= 20379

∑x=244

∑y=488

∑x² =11086

∑y² =40266

n =6.

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)² 

R = 6(20379) – (244)(488) / √ [6(11086)-(244)²][6(40266)-(488)² 

R = 3202 / √ [6980][3452]

R = 3202/4972.238

R = 0.6439

It shows that the relationship between the variables of the data is a strong positive relationship.

Problem 3: Calculate the correlation coefficient for the following data:

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

Solution:

Given variables are,

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula

X Y XY
21 70 1470 441 4900
31 55 1705 961 3025
25 60 1400 625 3600
40 78 3120 1600 6084
47 66 3102 2209 4356
38 80 3040 1444 6400
∑202 ∑409 ∑13937 ∑7280 ∑28265

∑xy= 13937

∑x=202

∑y=409

∑x² =7280

∑y² =28265

n =6

Put all the values in the Pearson’s correlation coefficient formula:-

R=  n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)² 

R= 6(13937) – (202)(409) / √ [6(7280)-(202)²][6(28265)-(409)²]

R= 1004 / √[2876][2909]

R=1004 / 2892.452938

R=-0.3471

It shows that the relationship between the variables of the data is a moderate positive relationship.

Problem 4: Calculate the correlation coefficient for the following data:

X= 12, 10, 42, 27,35,56

and

Y = 13, 15, 56, 34,65,26

Solution:

Given variables are,

X= 12, 10, 42, 27,35,56

and

Y = 13, 15, 56, 34,65,26

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula

X Y XY
12 13 156 144 169
10 15 150 100 225
42 56 2353 1764 3136
27 34 918 729 1156
35 65 2275 1225 4225
56 26 1456 3136 676
∑182 ∑209 ∑7307 ∑7098 ∑9587

∑xy= 7307

∑x=182

∑y=209

∑x² =7098

∑y² =9587

n =6

Put all the values in the Pearson’s correlation coefficient formula:-

R=  n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R=  6(7307) – (182)(209)  / √ [6(7098)-(182)²][6(9587)-(209)²] 

R= 5804 / √[9464][13841]

R= 5804/11445.139

R=0.5071

It shows that the relationship between the variables of the data is a strong positive relationship.

Problem 5: There is some correlation coefficient that was given to tell whether the variables are positive or negative?

0.69

0.42

-0.23

-0.99

Solution:

The given correlation coefficient is as follows:

0.64

0.46

-0.29

-0.95

Tell whether the relationship is negative or positive

0.64

The relationship between the variables is a strong positive relationship

0.46

The relationship between the variables is a strong positive relationship

-0.29

The relationship between the variables is a weak negative relationship

-0.95

The relationship between the variables is a very strong negative relationship.

Problem 6: Calculate the correlation coefficient for the following data:

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

Solution:

Given variables are,

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

X Y XY  X²  Y²
10 5 50 100 25
13 10 130 169 100
15 15 225 225 225
17 20 340 289 400
19 25 475 361 625
∑74 ∑75 ∑1103 ∑1144 ∑1375

∑xy= 1103

∑x=74

∑y=75

∑x² =1144

∑y² =1375

n =5

Put all the values in the Pearson’s correlation coefficient formula:-

R=  n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R=  5(1103) – (74)(75)  / √ [5(1144)-(74)²][5(1375)-(75)²] 

R= -35 / √[244][1250]

 R= -35/552.26

 R=0.0633

It shows that the relationship between the variables of the data is a negligible relationship.

Problem 7: Find the value of the correlation coefficient from the following table:

SUBJECT AGE X Weight Y
1 40 99
2 25 79
3 22 69
4 54 89

Solution:

SUBJECT AGE X Weight Y XY  X²
1 40 99 3960 1600 9801
2 25 79 1975 625 6241
3 22 69 1518 484 4761
4 54 89 4806 2916 7921
151 336 12259 5625 28724

∑xy= 12258

∑x=151

∑y=336

∑x² =5625

∑y² 28724

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

R=  n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 4(12258) – (151)(336) / √ [4(5625)-(151)²][4(28724)-(336)²]

R= -1704 / √ [-301][-2000]

 R=-1704/775.886

R=-2.1961

It shows that the relationship between the variables of the data is a very strong negative relationship.



Last Updated : 19 Mar, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads