Open In App

Karl Pearson’s Coefficient of Correlation | Methods and Examples

Last Updated : 11 Oct, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

What is Karl Pearson’s Coefficient of Correlation?

The first person to give a mathematical formula for the measurement of the degree of relationship between two variables in 1890 was Karl Pearson. Karl Pearson’s Coefficient of Correlation is also known as Product Moment Correlation or Simple Correlation Coefficient. This method of measuring the coefficient of correlation is the most popular and is widely used. It is denoted by ‘r’, where r is a pure number which means that r has no unit. 

According to Karl Pearson, “Coefficient of Correlation is calculated by dividing the sum of products of deviations from their respective means by their number of pairs and their standard deviations.”

Karl~Pearson's~Coefficient~of~Correlation(r)=\frac{Sum~of~Products~of~Deviations~from~their~respective~means}{Number~of~Pairs\times{Standard~Deviations~of~both~Series}}

Or

r=\frac{\sum{xy}}{N\times{\sigma_x}\times{\sigma_y}}

Where,

N = Number of Pair of Observations

x = Deviation of X series from Mean (X-\bar{X})

y = Deviation of Y series from Mean (Y-\bar{Y})

\sigma_x     = Standard Deviation of X series (\sqrt{\frac{\sum{x^2}}{N}})

\sigma_y     = Standard Deviation of Y series (\sqrt{\frac{\sum{y^2}}{N}})

r = Coefficient of Correlation

Methods of Calculating Karl Pearson’s Coefficient of Correlation

  1. Actual Mean Method
  2. Direct Method
  3. Short-Cut Method/Assumed Mean Method/Indirect Method
  4. Step-Deviation Method

1. Actual Mean Method

The steps involved in the calculation of coefficient of correlation by using Actual Mean Method are:

  1. The first step is to calculate the mean of the given two series (say X and Y).
  2. Now, take the deviation of X series from \bar{X}    and denote the deviations by x.
  3. Square the deviations of x and obtain the total; i.e., \sum{x^2}    
  4. Take the deviation of Y series from \bar{Y}    and denote the deviations by y.
  5. Square the deviations of y and obtain the total; i.e., \sum{y^2}    
  6. Multiply the respective deviations of Series X and Y and obtain the total; i.e., \sum{xy}   .
  7. Now, use the following formula to determine the Coefficient of Correlation:

r=\frac{\sum{xy}}{\sqrt{\sum{x^2}\times{\sum{y^2}}}}

Example:

Use Actual Mean Method and determine the coefficient of correlation for the following data:

Data Table

Solution:

Coefficient of Correlation

\bar{X}=\frac{\sum{X}}{N}=\frac{168}{7}=24

\bar{Y}=\frac{\sum{Y}}{N}=\frac{105}{7}=15

r=\frac{\sum{xy}}{\sqrt{\sum{x^2}\times{\sum{y^2}}}}

∑xy = 336, ∑x2 = 448, ∑y2 = 252

r=\frac{336}{\sqrt{448\times252}}=\frac{336}{\sqrt{1,12,896}}=\frac{336}{336}=1

Coefficient of Correlation = 1

It means that there is a perfect positive correlation between the values of Series X and Series Y.

2. Direct Method

The steps involved in the calculation of coefficient of correlation by using Direct Method are:

  1. The first step is to calculate the sum of Series X (∑X).
  2. Now, calculate the sum of Series Y (∑Y).
  3. Square the values of X Series and calculate their total; i.e., ∑X2.
  4. Square the values of Y Series and calculate their total; i.e., ∑Y2.
  5. Multiply the values of Series X and Y and calculate their total; i.e., ∑XY.
  6. Now, use the following formula to determine Coefficient of Correlation:

r=\frac{N\sum{XY}-\sum{X}.\sum{Y}}{\sqrt{N\sum{X^2}-(\sum{X})^2}{\sqrt{N\sum{Y^2}-(\sum{Y})^2}}}

Example:

Use Direct Method and determine the coefficient of correlation for the following data:

Data Table

Solution:

Coefficient of Correlation

r=\frac{N\sum{XY}-\sum{X}.\sum{Y}}{\sqrt{N\sum{X^2}-(\sum{X})^2}{\sqrt{N\sum{Y^2}-(\sum{Y})^2}}}

=\frac{(7\times2,856)-(168\times105)}{\sqrt{(7\times4,480)-(168)^2}\times{\sqrt{(7\times1,827)-(105)^2}}}

=\frac{19,992-17,640}{\sqrt{31,360-28,224}\times{\sqrt{12,789-11,025}}}

=\frac{2,352}{\sqrt{3,136}\times{\sqrt{1,764}}}=\frac{2,352}{56\times42}

=\frac{2,352}{2,352}=1

Coefficient of Correlation = 1

It means that there is a perfect positive correlation between the values of Series X and Series Y.

3. Short-Cut Method/Assumed Mean Method

Actual Mean can sometimes come in fractions which can make the calculation of standard deviation complicated and difficult. In those cases, it is suggested to use Short-Cut Method to simplify the calculations. The steps involved in the calculation of coefficient of correlation by using Assumed Mean Method are:

  1. First of all, take the deviations of X Series from the assumed mean and denote the values by dx. Calculate their total; i.e., ∑dx.
  2. Now, square the deviations of X series and calculate their total; i.e., ∑dx2.
  3. Take the deviations of Y Series from the assumed mean and denote the values by dy. Calculate their total; i.e., ∑dy.
  4. Square the deviations of Y series and calculate their total; i.e., ∑dy2.
  5. Multiply dx and dy and calculate their total; i.e., ∑dxdy.
  6. Now, use the following formula to determine Coefficient of Correlation:

r=\frac{N\sum{dxdy}-\sum{dx}.\sum{dy}}{\sqrt{N\sum{dx^2}-(\sum{dx})^2}{\sqrt{N\sum{dy^2}-(\sum{dy})^2}}}

Where,

N = Number of pair of observations

∑dx = Sum of deviations of X values from assumed mean

∑dy = Sum of deviations of Y values from assumed mean

∑dx2 = Sum of squared deviations of X values from assumed mean

∑dy2 = Sum of squared deviations of Y values from assumed mean

∑dxdy = Sum of the products of deviations dx and dy

Example:

Use Assumed Mean Method and determine the coefficient of correlation for the following data:

Data Table

Solution:

Coefficient of Correlation

r=\frac{N\sum{dxdy}-\sum{dx}.\sum{dy}}{\sqrt{N\sum{dx^2}-(\sum{dx})^2}{\sqrt{N\sum{dy^2}-(\sum{dy})^2}}}

=\frac{(7\times420)-(28\times21)}{\sqrt{(7\times560)-(28)^2}\times{\sqrt{(7\times315)-(21)^2}}}

=\frac{2,940-588}{\sqrt{3,920-784}\times{\sqrt{2,205-441}}}

=\frac{2,352}{\sqrt{3,136}\times{\sqrt{1,764}}}=\frac{2,352}{56\times42}

=\frac{2,352}{2,352}=1

Coefficient of Correlation = 1

It means that there is perfect positive correlation between the values of Series X and Series Y.

4. Step Deviation Method

This method simplifies the calculation of coefficient of correlation as the deviations are taken from assumed means and are divided by a common factor. The steps involved in the calculation of coefficient of correlation by using Step Deviation Method are:

  1. First of all, take the deviations of Series X from the assumed mean and divide them by Common Factor (C) to determine step deviation (dx^\prime)   . Calculate the total of step deviations; i.e., \sum{dx^\prime}
  2. Take the deviations of Series Y from the assumed mean and divide them by Common Factor (C) to determine step deviation (dy^\prime)   . Calculate the total of step deviations; i.e., \sum{dy^\prime}
  3. Square the step deviation of Series X and determine their total; i.e., \sum{dx^\prime{^2}}
  4. Square the step deviation of Series Y and determine their total; i.e., \sum{dy^\prime{^2}}
  5. Multiply (dx^\prime)    and (dy^\prime)   , and determine their total; i.e., \sum{dx^\prime{dy^\prime}}
  6. Now, use the following formula to determine Coefficient of Correlation:

r=\frac{N\sum{dx^\prime{dy^\prime}}-\sum{dx^\prime}.\sum{dy^\prime}}{\sqrt{N\sum{dx^\prime{^2}}-(\sum{dx^\prime})^2}{\sqrt{N\sum{dy^\prime{^2}}-(\sum{dy^\prime})^2}}}

Where,

N = Number of pair of observations

\sum{dx^\prime}    = Sum of deviations of X values from assumed mean

\sum{dy^\prime}    = Sum of deviations of Y values from assumed mean

\sum{dx^\prime{^2}}    = Sum of squared deviations of X values from assumed mean

\sum{dy^\prime{^2}}    = Sum of squared deviations of Y values from assumed mean

\sum{dx^\prime{dy^\prime}}    = Sum of the products of deviations (dx^\prime)    and (dy^\prime)

Example:

Use Step Deviation Method and determine the coefficient of correlation for the following data:

Data Table

Solution:

Coefficient of Correlation under Step Deviation Method

r=\frac{N\sum{dx^\prime{dy^\prime}}-\sum{dx^\prime}.\sum{dy^\prime}}{\sqrt{N\sum{dx^\prime{^2}}-(\sum{dx^\prime})^2}{\sqrt{N\sum{dy^\prime{^2}}-(\sum{dy^\prime})^2}}}

=\frac{(7\times35)-(7\times7)}{\sqrt{(7\times35)-(7)^2}\times{\sqrt{(7\times35)-(7)^2}}}

=\frac{245-49}{\sqrt{245-49}\times{\sqrt{245-49}}}

=\frac{196}{\sqrt{196}\times{\sqrt{196}}}=\frac{196}{14\times14}

=\frac{196}{196}=1

Coefficient of Correlation = 1

It means that there is a perfect positive correlation between the values of Series X and Series Y.

Change of Scale and Origin

Coefficient of Correlation does not depend upon the change of scale and origin. 

  • Change of Origin: If a constant is added or subtracted to the values then it will not have any effect on the value of correlation coefficient.
  • Change of Scale: Similarly, if a constant is multiplied or divided by the values, then it will not have any effect on the value of correlation coefficient.

Example:

Find the coefficient of correlation from the following figures:

Data Table

Solution:

As the coefficient of correlation is not affected by the change in scale and origin of the variables, we will multiply the X Series by 10 and divide the Y series by 100.

Coefficient of Correlation

r=\frac{N\sum{dxdy}-\sum{dx}.\sum{dy}}{\sqrt{N\sum{dx^2}-(\sum{dx})^2}{\sqrt{N\sum{dy^2}-(\sum{dy})^2}}}

=\frac{(8\times156)-[(-24)\times(-4)]}{\sqrt{(8\times1,584)-(-24)^2}\times{\sqrt{(8\times44)-(-4)^2}}}

=\frac{1,248-96}{\sqrt{12,672-576}\times{\sqrt{352-16}}}

=\frac{1,152}{\sqrt{12,096}\times{\sqrt{336}}}=\frac{1,152}{110\times18.3}

=\frac{1,152}{2,013}=0.57

Coefficient of Correlation = 0.57

It means that there is a moderate degree of positive correlation between variables X and Y.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads