Open In App

Pearson Correlation Coefficient

Last Updated : 08 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Pearson Correlation Coefficient: Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get a correlation coefficient, one of the most popular is Pearson’s correlation (also known as Pearson’s r) which is commonly used for linear regression.

The Pearson correlation coefficient, often symbolized as (r), is a widely used metric for assessing linear relationships between two variables. It yields a value ranging from –1 to 1, indicating both the magnitude and direction of the correlation. A change in one variable is mirrored by a corresponding change in the other variable in the same direction.

This article provides detailed information on the Pearson Correlation Coefficient, its meaning, formula, interpretation, examples, and FAQs.

What is the Pearson Correlation Coefficient?

The Pearson Correlation Coefficient, denoted as r, is a statistical measure that calculates the strength and direction of the linear relationship between two variables on a scatterplot. The value of r ranges between -1 and 1, where:

  • 1 indicates a perfect positive linear relationship,
  • -1 indicates a perfect negative linear relationship, and
  • 0 indicates no linear relationship between the variables.

Pearson’s Correlation Coefficient Formula

Karl Pearson’s correlation coefficient formula is the most commonly used and the most popular formula to get the statistical correlation coefficient. It is denoted with the lowercase “r”. The formula for Pearson’s correlation coefficient is shown below:

r = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²        

The full name for Pearson’s correlation coefficient formula is Pearson’s Product Moment correlation (PPMC). It helps in displaying the Linear relationship between the two sets of the data.

Pearson’s correlation helps in measuring the correlation strength (it’s given by coefficient r-value between -1 and +1) and the existence (given by p-value ) of a linear correlation relationship between the two variables and if the outcome is significant we conclude that the correlation exists.

Cohen (1988) says that an absolute value of r of 0.5 is classified as large, an absolute value of 0.3 is classified as medium and an absolute value of 0.1 is classified as small.

The interpretation of the Pearson’s correlation coefficient is as follows:

  • A correlation coefficient of 1 means there is a positive increase of a fixed proportion of others, for every positive increase in one variable. Like, the size of the shoe goes up in perfect correlation with foot length.
  • If the correlation coefficient is 0, it indicates that there is no relationship between the variables.
  • A correlation coefficient of -1 means there is a negative decrease of a fixed proportion, for every positive increase in one variable. Like, the amount of water in a tank will decrease in a perfect correlation with the flow of a water tap.

The Pearson correlation coefficient essentially captures how closely the data points tend to follow a straight line when plotted together. It’s important to remember that correlation doesn’t imply causation – just because two variables are related, it doesn’t mean one causes the change in the other.

Pearson Correlation Coefficient Table

Pearson Correlation Coefficient (r) Range Type of Correlation Description of Relationship New Illustrative Example
0 < r ≤ 1 Positive An increase in one variable associates with an increase in the other. Study Time vs. Test Scores: More hours spent studying tends to lead to higher test scores.
r = 0 None No discernible relationship between the changes in both variables. Shoe Size vs. Reading Skill: A person’s shoe size doesn’t predict their ability to read.
-1 ≤ r < 0 Negative An increase in one variable associates with a decrease in the other. Outdoor Temperature vs. Home Heating Cost: As the outdoor temperature decreases, heating costs in the home increase.

Pearson Correlation Coefficient Origin

The Pearson correlation coefficient, although named after statistician Karl Pearson, has a more interesting backstory. The concept of correlation itself can be traced back to Francis Galton, a 19th-century scientist and explorer. Galton was fascinated by inheritance and explored relationships between traits in families.

While Galton planted the seed for the idea, the mathematical formula behind the coefficient actually came from French physicist Auguste Bravais in 1844. However, it was Karl Pearson who truly championed the concept in the late 1800s. He refined the mathematical treatment, explored its properties, and popularized its use in statistical analysis. For this reason, the coefficient bears his name, even though earlier contributions played a crucial role in its development.

Types of Pearson Correlation Coefficient

Each type of Pearson correlation coefficient offers unique insights and analytical tools for various research fields, from statistics and psychology to economics and engineering. Understanding these variations enhances the accuracy and depth of correlation analyses, enabling more informed decision-making and hypothesis testing.

Adjusted Correlation Coefficient

Adjusted correlation coefficient modifies the standard Pearson correlation coefficient to account for sample size and bias, especially when dealing with small sample sizes. It adjusts the correlation coefficient to provide a more accurate estimation of the population correlation.

Weighted Correlation Coefficient

Weighted correlation coefficient assigns different weights to individual data points based on their importance or reliability. This approach is useful when certain observations carry more significance or have different levels of precision.

Reflective Correlation Coefficient

Reflective correlation coefficient evaluates the relationship between variables in a reflective model, commonly used in structural equation modeling (SEM) to analyze latent constructs. It assesses the relationship between observed variables and underlying constructs.

Scaled Correlation Coefficient

Scaled correlation coefficient scales the correlation coefficient to a specific range or magnitude, facilitating comparison across different datasets or studies. It ensures consistency in interpretation by standardizing correlation values.

Pearson’s Distance

Pearson’s distance measures the dissimilarity or similarity between two data points based on their correlation coefficient. It quantifies the extent of deviation from perfect correlation, providing insights into the relationship between variables.

Circular Correlation Coefficient

Circular correlation coefficient assesses the relationship between circular variables, such as angles or directions. It accounts for the cyclical nature of data and measures the degree of association between circular datasets.

Partial Correlation

Partial correlation evaluates the relationship between two variables while controlling for the effects of one or more additional variables. It measures the unique association between variables after accounting for the influence of other factors, allowing researchers to isolate specific statistical relationships.

Pearson Correlation Coefficient Interpretation

Pearson correlation coefficient (r) value Strength Direction
Greater than .5 Strong Positive
Between .3 and .5 Moderate Positive
Between 0 and .3 Weak Positive
0 None None
Between 0 and –.3 Weak Negative
Between –.3 and –.5 Moderate Negative
Less than –.5 Strong Negative

Finding the Correlation Coefficient with Pearson Correlation Coefficient Formula

Steps to find the correlation coefficient with Pearson’s correlation coefficient formula:

Step 1: Firstly make a chart with the given data like subject, x, and y and add three more columns in it xy,x² and y².

Step 2: Now multiply the x and y columns to fill the xy column. For example:- in x we have 24 and in y we have 65 so xy will be 24×65=1560.

Step 3: Now, take the square of the numbers in the x column and fill the x² column.

Step 4: Now, take the square of the numbers in the y column and fill the y² column.

Step 5: Now, add up all the values in the columns and put the result at the bottom. Greek letter sigma (Σ) is the short way of saying summation.

Step 6: Now, use the formula for Pearson’s correlation coefficient:-

R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²       

To know which type of variable we have either positive or negative.

Assumptions of Pearson Correlation Coefficient

  1. Linear Relationship: Karl Pearson’s correlation coefficient assumes a linear relationship between the two variables under consideration. It implies that as one variable changes, the other changes proportionally.
  2. Normality: The variables should follow a normal distribution. While Pearson’s correlation coefficient is robust to deviations from normality, extreme departures may affect the validity of the correlation analysis.
  3. Homoscedasticity: This assumption suggests that the variability in one variable should be consistent across all levels of the other variable. In other words, the spread of data points around the regression line should remain constant.
  4. Interval or Ratio Scale: Pearson’s correlation coefficient is appropriate for variables measured on an interval or ratio scale. These scales ensure meaningful numerical distances between observations.
  5. Independence: The observations used to compute the correlation coefficient should be independent of each other. Independence ensures that each data point contributes uniquely to the analysis without being influenced by other observations.

Correlation Coefficient Properties

  1. Correlation Coefficient Range: The correlation coefficient r ranges from -1 to +1, inclusive. A value of -1 indicates a perfect negative linear relationship, +1 denotes a perfect positive linear relationship, and 0 represents no linear relationship.
  2. Directionality: The sign of the correlation coefficient indicates the direction of the relationship between variables. A positive r indicates a positive association (both variables increase or decrease together), while a negative r suggests a negative association (one variable increases as the other decreases).
  3. Magnitude: The magnitude of the correlation coefficient represents the strength of the relationship between variables. Values closer to -1 or +1 indicate a stronger linear relationship, while values closer to 0 suggest a weaker relationship.
  4. No Causation: Pearson’s correlation coefficient does not imply causation between variables. It only measures the degree of linear association and does not establish a cause-and-effect relationship.
  5. Symmetry: The correlation coefficient is symmetric, meaning the correlation between variables X and Y is the same as the correlation between Y and X.
  6. Invariance: The correlation coefficient remains unchanged under linear transformations of the variables (e.g., multiplication by a constant or addition of a constant), making it invariant to changes in scale and location.

Pearson Correlation Coefficient Interpretation

Interpreting the Pearson correlation coefficient (r) involves assessing the correlation strength, direction, and correlation significance of the relationship between two variables. Here’s a guide to interpreting r:

  1. Strength of Relationship:
    • Close to +1: Indicates a strong positive linear relationship. As one variable increases, the other tends to increase proportionally.
    • Close to -1: Suggests a strong negative linear relationship. As one variable increases, the other tends to decrease proportionally.
    • Close to 0: Implies a weak or no linear relationship. Changes in one variable do not consistently predict changes in the other.
  2. Direction of Relationship:
    • Positive r: Both variables tend to increase or decrease together.
    • Negative r: One variable tends to increase as the other decreases, and vice versa.
  3. Significance:
    • Statistical significance indicates whether the observed correlation coefficient is likely to occur due to chance.
    • Significance is typically assessed using a hypothesis test, such as the t-test for correlation coefficient, with the null hypothesis stating that the true correlation coefficient in the population is zero.
    • If the p-value is less than the chosen significance level (e.g., 0.05), the correlation is considered statistically significant.
  4. Scatterplot Examination:
    • Visual inspection of a scatterplot can provide additional insights into the relationship between variables.
    • A scatterplot allows you to assess the linearity, directionality, and presence of outliers, complementing the numerical interpretation of r.
  5. Caution:
    • Correlation does not imply causation. Even if a strong correlation is observed between two variables, it does not necessarily mean that changes in one variable cause changes in the other.
    • Other factors, such as confounding variables or omitted variables, may influence the observed correlation.
  6. Sample Size:
    • Larger sample sizes tend to provide more reliable estimates of correlation coefficients, reducing the likelihood of obtaining spurious correlations.
  7. Context Dependence:
    • The interpretation of r should consider the specific context and subject matter of the study. What is considered a strong or weak correlation may vary depending on the field of research and the variables under investigation.

Bivariate Correlation

Pearson’s correlation coefficient is a statistical tool used to measure bivariate correlation. This refers to the strength and direction of the linear relationship between two variables. It assesses how much one variable tends to change along with the other. A positive correlation indicates that as one variable increases, the other tends to increase as well. Conversely, a negative correlation suggests that as one variable goes up, the other tends to go down. A value of zero indicates no linear relationship between the variables.

Correlation Matrix

The Pearson correlation coefficient is particularly useful when analyzing datasets with multiple variables. In such cases, a correlation matrix can be constructed. This is a square table that summarizes the correlation coefficients between all possible pairs of variables within the data set. By looking at the correlation matrix, researchers can quickly identify which variables have strong positive, negative, or no linear relationship with each other. This helps them understand the overall structure of the data and identify potential relationships for further investigation.

Pearson Correlation Coefficient Examples

Example 1: There is some correlation coefficient that was given to tell whether the variables are positive or negative?

0.69, 0.42, -0.23, -0.99

Solution:

The given correlation coefficient is as follows:

0.69, 0.42, -0.23, -0.99

Tell whether the relationship is negative or positive

0.69: The relationship between the variables is a strong positive relationship

0.42: The relationship between the variables is a strong positive relationship

-0.23: The relationship between the variables is a weak negative relationship

-0.99: The relationship between the variables is a very strong negative relationship

Example 2: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

Solution:

Given variables are,

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

X Y XY  Y²
10 5 50 100 25
13 10 130 169 100
15 15 225 225 225
17 20 340 289 400
19 25 475 362 625
∑74 ∑75 ∑1103 ∑1144 ∑1375

∑xy = 1103

∑x = 74

∑y = 75

∑x² = 1144

∑y² = 1375

n = 5

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R = 5(1103) – (74)(75) / √ [5(1144)-(74)²][5(1375)-(75)²]

R = -35 / √[244][1250]

R = -35/552.26

R = 0.0633

The correlation coefficient is 0.064

Example 3: Calculate the correlation coefficient for the following table with the help of Pearson’s correlation coefficient formula:

SUBJECT AGE X Weight Y
1 40 99
2 25 79
3 22 69
4 54 89

Solution:

Make a table from the given data and add three more columns of XY, X², and Y². also add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x², and ∑y² and n =4.

SUBJECT AGE X Weight Y XY  X²
1 40 99 3960 1600 9801
2 25 79 1975 625 6241
3 22 69 1518 484 4761
4 54 89 4806 2916 7921
151 336 12259 5625 28724

∑xy = 12258

∑x = 151

∑y = 336

∑x² = 5625

∑y² = 28724

n = 4

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R =  4(12258) – (151)(336) / √ [4(5625)-(151)²][4(28724)-(336)²]

R = -1704 / √ [-301][-2000]

R = -1704/775.886

R = -2.1961

The correlation coefficient is -2.196

Example 4: Calculate the correlation coefficient for the following data with the help of Pearson’s correlation coefficient formula:

X = 5 ,9 ,14, 16

and

Y = 6, 10, 16, 20 .

Solution:

Given variables are,

X = 5 ,9 ,14, 16

and

Y = 6, 10, 16, 20 .

To, find the correlation coefficient of the following variables Firstly a table to be constructed as follows, to get the values required in the formula 

also, add all the values in the columns to get the values used in the formula.

X Y XY  Y²
5 6 30 25 36
9 10 90 81 100
14 16 224 196 256
16 20 320 256 400
∑ 44 ∑ 52 ∑ 664 ∑ 558 ∑ 792

∑xy= 664

∑x=44

∑y=52

∑x² =558

∑y² =792

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

 R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R=  4(664) – (44)(52) / √ [4(558)-(44)²][4(792)-(52)²]

R= 368 / √[296][464]        

R=368/370.599

R=0.994

The correlation coefficient is 0.994

Example 5: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

Solution:

Given variables are,

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

X Y XY  Y²
21 70 1470 441 4900
31 55 1705 961 3025
25 60 1500 625 3600
40 78 3120 1600 6084
47 66 3102 2209 4356
38 80 3040 1444 6400
∑202 ∑409 ∑13937 ∑7280 ∑28265

∑xy= 13937

∑x=202

∑y=409

∑x² =7280

∑y² =28265

n =6

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 6(13937) – (202)(409) / √ [6(7280)-(202)²][6(28265)-(409)²]

R= 1004 / √[2876][2909]

R=1004 / 2892.452938

R=-0.3471

The correlation coefficient is -0.3471

Example 6: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

SUBJECT Height X Weight Y
1 43 78
2 24 68
3 26 85
4 35 67

Solution:

Make a table from the given data and add three more columns of XY , X² and Y² and add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x² and ∑y² and n =4.

SUBJECT Height X Weight Y XY
1 43 78 3354 1849 6084
2 24 68 1632 567 4624
3 26 85 2210 676 7225
4 35 67 2345 1225 4889
128 298 9541 4317 22422

∑xy= 9541

∑x=128

∑y=298

∑x² =4317

∑y² 22422

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 4(9541) – (128)(298) / √ [4(4317)-(128)²][4(22422)-(298)²]

R= 20 / √ [884][884]

R=20/884

R=0.02262

The correlation coefficient is 0.02262

People Also Read:

Pearson Correlation Coefficient Practice Problems

1. Given a Pearson correlation coefficient of r = 0.85 between the amount of time students spent studying and their score on a math test, interpret the strength and direction of the relationship.

2. You have data on the number of ice creams sold and the outdoor temperature. After calculating, you find r = −0.62. What does this say about the relationship between the temperature and ice cream sales?

3. Consider the following small dataset representing hours studied (X) and test scores (Y):

Hours Studied (X) Test Score (Y)
1 50
2 55
3 65
4 70
5 80

Calculate the Pearson correlation coefficient (r) for the data.

Summary – Pearson Correlation Coefficient

The Pearson Correlation Coefficient, symbolized as r, is a statistical tool used to measure the strength and direction of a linear relationship between two variables on a scatterplot. Its values range from -1 to 1, where 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 suggests no linear relationship at all. The formula to calculate r involves summing the products of paired scores, then dividing by the square root of the product of the sums of squared scores. This coefficient not only quantifies the degree of linear association between two variables but also highlights the presence of any linear correlation through its magnitude (absolute value) and direction (positive or negative). However, it’s crucial to remember that a high or low Pearson coefficient doesn’t imply causation but merely indicates how strongly two variables are related in a linear manner. The origins of this coefficient trace back to Francis Galton’s work on inheritance, with significant contributions from French physicist Auguste Bravais and statistician Karl Pearson, who popularized its use. The Pearson Correlation Coefficient is foundational in fields ranging from psychology to economics, aiding in the interpretation and analysis of the linear relationship between variables, under the assumption that these relationships are linear, the data is normally distributed, homoscedastic, and measured on an interval or ratio scale, with each observation being independent.

FAQs on Pearson Correlation Coefficient

What is Karl Pearson’s coefficient of correlation?

Karl Pearson’s coefficient of correlation, commonly known as the Pearson correlation coefficient (r), is a statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. Correlation coefficient ranges from -1 to +1, where -1 indicates a perfect negative linear relationship, +1 indicates a perfect positive linear relationship, and 0 suggests no linear relationship.

What is the Pearson correlation coefficient?

The Pearson correlation coefficient (r) is a measure of the strength and direction of the linear relationship between two variables. It is calculated by dividing the covariance of the two variables by the product of their standard deviations.

What is the formula for Pearson Correlation Coefficient?

r = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²        

Why do we use the Pearson correlation coefficient?

The Pearson correlation coefficient is used to assess the strength and direction of the linear relationship between two variables. It helps researchers and analysts understand how changes in one variable correspond to changes in another, aiding in hypothesis testing, model building, and making predictions in various fields such as psychology, economics, biology, and social sciences.

What does Pearson’s correlation coefficient tell you?

Pearson’s correlation coefficient quantifies the strength and direction of the linear relationship between two variables. It tells us whether the variables move together (positive correlation), move in opposite directions (negative correlation), or have no discernible pattern of movement (zero correlation).

What is the difference between r2 and Pearson correlation?

The Pearson correlation coefficient ( r) measures the strength and direction of the linear relationship between two variables, while r2 (the coefficient of determination) represents the proportion of variance in one variable that is predictable from the other variable in a linear regression model. In essence, r2 is the square of the Pearson correlation coefficient and provides a measure of the goodness of fit of a linear regression model.

What is a good correlation coefficient?

A good correlation coefficient depends on the context and the specific field of study. Generally, a correlation coefficient close to +1 or -1 indicates a strong linear relationship between variables, while a coefficient close to 0 suggests a weak or no linear relationship. However, what constitutes a “good” correlation varies depending on the research question, field of study, and practical implications.

What does a correlation coefficient of 0.5 mean?

A correlation coefficient of 0.5 indicates a moderate positive linear relationship between two variables. It suggests that as one variable increases, the other tends to increase as well, but the relationship is not perfect.

What does a 0.2 correlation mean?

A correlation coefficient of 0.2 suggests a weak positive linear relationship between two variables. While there is some tendency for the variables to move together, the relationship is relatively weak and may not be practically significant without further context.

Is a correlation coefficient of 0.4 strong?

A correlation coefficient of 0.4 indicates a moderate positive linear relationship between two variables. While not as strong as coefficients closer to +1, a value of 0.4 still suggests a discernible pattern of association between the variables, which may be meaningful depending on the context of the study.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads