Open In App

Pearson Product Moment Correlation

The Pearson product-moment correlation coefficient (or Pearson correlation coefficient) is a measure of the strength of a linear association between two variables and is denoted by r. Basically, a Pearson product-moment correlation attempts to draw a line of best fit through the data of two variables, and the Pearson correlation coefficient, r, indicates how far away all these data points are to this line of best fit (i.e., how well the data points fit this new model/line of best fit).

Coefficient of Correlation: 

The correlation coefficient can be calculated as the covariance divided by the standard deviation of the variables. The following formula is used to calculate the Pearson correlation (r):

[Tex]r= \frac{\sum (x_i - \bar{x}) (y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^{2}\sum(y_i - \bar{y})^{2}}}[/Tex]

The above value of the correlation coefficient can be between -1 and 1. A value close to 1 represents that perfect degree of association b/w the two variables and called a strong correlation and a value close to -1 represents the strong negative correlation. The value closer to 0 represents the weaker or no degree of correlation.

A strongly positive correlation (r=1)

Strongly Negative Correlation (r=-1)

No correlation (r~=0)

Test of Significance: 

A test of significance for the Pearson's correlation coefficient may be used to find out if the computed Pearson correlation r could have significantly occurred in the population in which the two variables are significantly related or not. The test statistics follow t-distribution with N-2 degree of freedom. The significance is computed using the following formula

[Tex]t_{calc} = r\frac{\sqrt{N-2}}{\sqrt{1- r^{2}}}[/Tex]

While performing the test, we may assume following hypothesis:

Steps for performing the test:

Implementation

# imports
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

# define the variable
x = np.arange(1,11)
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 30, 27])

# plot the variables
plt.scatter(x,y)
plt.plot(x,y)

# the plot above represents a strong correlation.
correlation_coeff, p_value = stats.pearsonr(x,y)
# print pearson correlation coefficient
print(correlation_coeff)
# print p-value: the smallest level of significance that will be enough to reject H0
print(p_value)

 
 

Line plot 

# correlation coefficient
0.960576518918945
# p-value
1.0076332440506521e-05


 

Article Tags :