Skip to content
Related Articles
Get the best out of our app
GeeksforGeeks App
Open App
geeksforgeeks
Browser
Continue

Related Articles

Pearson Correlation Coefficient

Improve Article
Save Article
Like Article
Improve Article
Save Article
Like Article

Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get correlation coefficient, one of the most popular is Pearson’s correlation (also known as Pearson’s R) which is commonly used for linear regression. The Pearson’s correlation coefficient is denoted with the symbol “R”. The correlation coefficient formula returns a value between 1 and -1. Here,

  • -1 indicates a strong negative relationship
  • 1 indicates strong positive relationships
  • And a result of zero indicates no relationship at all

Pearson’s Correlation Coefficient Formula

The Pearson’s correlation coefficient formula is the most commonly used and the most popular formula to get the correlation coefficient. It is denoted with the capital “R”. The formula for Pearson’s correlation coefficient is shown below,

R= n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²        

The full name for Pearson’s correlation coefficient formula is Pearson’s Product Moment correlation (PPMC). It helps in displaying the Linear relationship between the two sets of the data.

The Pearson’s correlation helps in measuring the strength(it’s given by coefficient r-value between -1 and +1) and the existence (given by p-value )of a linear relationship between the two variables and if the outcome is significant we conclude that the correlation exists.

Cohen (1988) says that an absolute value of r of 0.5 is classified as large, an absolute value of 0.3 is classified as medium and an absolute value of 0.1 is classified as small.

The interpretation of the Pearson’s correlation coefficient is as follows:-

  • A correlation coefficient of  1 means there is a positive increase of a fixed proportion of others, for every positive increase in one variable. Like, the size of the shoe goes up in perfect correlation with foot length.
  • If the correlation coefficient is 0, it indicates that there is no relationship between the variables.
  • A correlation coefficient of -1 means there is a negative decrease of a fixed proportion, for every positive increase in one variable. Like, the amount of water in a tank will decrease in a perfect correlation with the flow of a water tap.

Steps to find the correlation coefficient with Pearson’s correlation coefficient formula:

Step 1: Firstly make a chart with the given data like subject, x, and y and add three more columns in it xy,x² and y².

Step 2: Now multiply the x and y columns to fill the xy column. For example:- in x we have 24 and in y we have 65 so xy will be 24×65=1560.

Step 3: Now, take the square of the numbers in the x column and fill the x² column.

Step 4: Now, take the square of the numbers in the y column and fill the y² column.

Step 5: Now, add up all the values in the columns and put the result at the bottom. Greek letter sigma (Σ) is the short way of saying summation.

Step 6: Now, use the formula for Pearson’s correlation coefficient:-

R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²       

To know which type of variable we have either positive or negative.

Sample Problems

Problem 1: There is some correlation coefficient that was given to tell whether the variables are positive or negative?

0.69, 0.42, -0.23, -0.99

Solution:

The given correlation coefficient is as follows:

0.69, 0.42, -0.23, -0.99

Tell whether the relationship is negative or positive

0.69: The relationship between the variables is a strong positive relationship

0.42: The relationship between the variables is a strong positive relationship

-0.23: The relationship between the variables is a weak negative relationship

-0.99: The relationship between the variables is a very strong negative relationship

Problem 2: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

Solution:

Given variables are,

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

XYXY Y²
1055010025
1310130169100
1515225225225
1720340289400
1925475362625
∑74∑75∑1103∑1144∑1375

∑xy = 1103

∑x = 74

∑y = 75

∑x² = 1144

∑y² = 1375

n = 5

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R = 5(1103) – (74)(75) / √ [5(1144)-(74)²][5(1375)-(75)²]

R = -35 / √[244][1250]

R = -35/552.26

R = 0.0633

The correlation coefficient is 0.064

Problem 3: Calculate the correlation coefficient for the following table with the help of Pearson’s correlation coefficient formula:

SUBJECTAGE XWeight Y
14099
22579
32269
45489

Solution:

Make a table from the given data and add three more columns of XY, X², and Y². also add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x², and ∑y² and n =4.

SUBJECTAGE XWeight YXY X²
14099396016009801
2257919756256241
3226915184844761
45489480629167921
15133612259562528724

∑xy = 12258

∑x = 151

∑y = 336

∑x² = 5625

∑y² = 28724

n = 4

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R =  4(12258) – (151)(336) / √ [4(5625)-(151)²][4(28724)-(336)²]

R = -1704 / √ [-301][-2000]

R = -1704/775.886

R = -2.1961

The correlation coefficient is -2.196

Problem 4: Calculate the correlation coefficient for the following data with the help of Pearson’s correlation coefficient formula:

X = 5 ,9 ,14, 16

and

Y = 6, 10, 16, 20 .

Solution:

Given variables are,

X = 5 ,9 ,14, 16

and

Y = 6, 10, 16, 20 .

To, find the correlation coefficient of the following variables Firstly a table to be constructed as follows, to get the values required in the formula 

also, add all the values in the columns to get the values used in the formula.

XYXY Y²
56302536
9109081100
1416224196256
1620320256400
∑ 44∑ 52∑ 664∑ 558∑ 792

∑xy= 664

∑x=44

∑y=52

∑x² =558

∑y² =792

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

 R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R=  4(664) – (44)(52) / √ [4(558)-(44)²][4(792)-(52)²]

R= 368 / √[296][464]        

R=368/370.599

R=0.994

The correlation coefficient is 0.994

Problem 5: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

Solution:

Given variables are,

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

XYXY Y²
217014704414900
315517059613025
256015006253600
4078312016006084
4766310222094356
3880304014446400
∑202∑409∑13937∑7280∑28265

∑xy= 13937

∑x=202

∑y=409

∑x² =7280

∑y² =28265

n =6

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 6(13937) – (202)(409) / √ [6(7280)-(202)²][6(28265)-(409)²]

R= 1004 / √[2876][2909]

R=1004 / 2892.452938

R=-0.3471

The correlation coefficient is -0.3471

Problem 6: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

SUBJECTHeight XWeight Y
14378
22468
32685
43567

Solution:

Make a table from the given data and add three more columns of XY , X² and Y² and add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x² and ∑y² and n =4.

SUBJECTHeight XWeight YXY
14378335418496084
2246816325674624
3268522106767225
43567234512254889
1282989541431722422

∑xy= 9541

∑x=128

∑y=298

∑x² =4317

∑y² 22422

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 4(9541) – (128)(298) / √ [4(4317)-(128)²][4(22422)-(298)²]

R= 20 / √ [884][884]

R=20/884

R=0.02262

The correlation coefficient is 0.02262


My Personal Notes arrow_drop_up
Last Updated : 16 Feb, 2022
Like Article
Save Article
Similar Reads
Related Tutorials