Pearson Correlation Coefficient

Last Updated : 08 Apr, 2024

Pearson Correlation Coefficient: Correlation coefficients are used to measure how strong a relationship is between two variables. There are different types of formulas to get a correlation coefficient, one of the most popular is Pearson’s correlation (also known as Pearson’s r) which is commonly used for linear regression.

The Pearson correlation coefficient, often symbolized as (r), is a widely used metric for assessing linear relationships between two variables. It yields a value ranging from –1 to 1, indicating both the magnitude and direction of the correlation. A change in one variable is mirrored by a corresponding change in the other variable in the same direction.

This article provides detailed information on the Pearson Correlation Coefficient, its meaning, formula, interpretation, examples, and FAQs.

Table of Content

What is the Pearson Correlation Coefficient?
Pearson’s Correlation Coefficient Formula
Pearson Correlation Coefficient Table
Pearson Correlation Coefficient Origin
Types of Pearson Correlation Coefficient
Pearson Correlation Coefficient Interpretation
Finding the Correlation Coefficient with Pearson Correlation Coefficient Formula
Assumptions of Pearson Correlation Coefficient
Correlation Coefficient Properties
Bivariate Correlation
Correlation Matrix
Pearson Correlation Coefficient Examples
Pearson Correlation Coefficient Practice Problems

What is the Pearson Correlation Coefficient?

The Pearson Correlation Coefficient, denoted as r, is a statistical measure that calculates the strength and direction of the linear relationship between two variables on a scatterplot. The value of r ranges between -1 and 1, where:

1 indicates a perfect positive linear relationship,
-1 indicates a perfect negative linear relationship, and
0 indicates no linear relationship between the variables.

Pearson’s Correlation Coefficient Formula

Karl Pearson’s correlation coefficient formula is the most commonly used and the most popular formula to get the statistical correlation coefficient. It is denoted with the lowercase “r”. The formula for Pearson’s correlation coefficient is shown below:

r = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²

The full name for Pearson’s correlation coefficient formula is Pearson’s Product Moment correlation (PPMC). It helps in displaying the Linear relationship between the two sets of the data.

Pearson’s correlation helps in measuring the correlation strength (it’s given by coefficient r-value between -1 and +1) and the existence (given by p-value ) of a linear correlation relationship between the two variables and if the outcome is significant we conclude that the correlation exists.

Cohen (1988) says that an absolute value of r of 0.5 is classified as large, an absolute value of 0.3 is classified as medium and an absolute value of 0.1 is classified as small.

The interpretation of the Pearson’s correlation coefficient is as follows:

A correlation coefficient of 1 means there is a positive increase of a fixed proportion of others, for every positive increase in one variable. Like, the size of the shoe goes up in perfect correlation with foot length.
If the correlation coefficient is 0, it indicates that there is no relationship between the variables.
A correlation coefficient of -1 means there is a negative decrease of a fixed proportion, for every positive increase in one variable. Like, the amount of water in a tank will decrease in a perfect correlation with the flow of a water tap.

The Pearson correlation coefficient essentially captures how closely the data points tend to follow a straight line when plotted together. It’s important to remember that correlation doesn’t imply causation – just because two variables are related, it doesn’t mean one causes the change in the other.

Pearson Correlation Coefficient Table

Pearson Correlation Coefficient (r) Range	Type of Correlation	Description of Relationship	New Illustrative Example
0 < r ≤ 1	Positive	An increase in one variable associates with an increase in the other.	Study Time vs. Test Scores: More hours spent studying tends to lead to higher test scores.
r = 0	None	No discernible relationship between the changes in both variables.	Shoe Size vs. Reading Skill: A person’s shoe size doesn’t predict their ability to read.
-1 ≤ r < 0	Negative	An increase in one variable associates with a decrease in the other.	Outdoor Temperature vs. Home Heating Cost: As the outdoor temperature decreases, heating costs in the home increase.

Pearson Correlation Coefficient Origin

The Pearson correlation coefficient, although named after statistician Karl Pearson, has a more interesting backstory. The concept of correlation itself can be traced back to Francis Galton, a 19th-century scientist and explorer. Galton was fascinated by inheritance and explored relationships between traits in families.

While Galton planted the seed for the idea, the mathematical formula behind the coefficient actually came from French physicist Auguste Bravais in 1844. However, it was Karl Pearson who truly championed the concept in the late 1800s. He refined the mathematical treatment, explored its properties, and popularized its use in statistical analysis. For this reason, the coefficient bears his name, even though earlier contributions played a crucial role in its development.

Types of Pearson Correlation Coefficient

Each type of Pearson correlation coefficient offers unique insights and analytical tools for various research fields, from statistics and psychology to economics and engineering. Understanding these variations enhances the accuracy and depth of correlation analyses, enabling more informed decision-making and hypothesis testing.

Adjusted Correlation Coefficient

Adjusted correlation coefficient modifies the standard Pearson correlation coefficient to account for sample size and bias, especially when dealing with small sample sizes. It adjusts the correlation coefficient to provide a more accurate estimation of the population correlation.

Weighted Correlation Coefficient

Weighted correlation coefficient assigns different weights to individual data points based on their importance or reliability. This approach is useful when certain observations carry more significance or have different levels of precision.

Reflective Correlation Coefficient

Reflective correlation coefficient evaluates the relationship between variables in a reflective model, commonly used in structural equation modeling (SEM) to analyze latent constructs. It assesses the relationship between observed variables and underlying constructs.

Scaled Correlation Coefficient

Scaled correlation coefficient scales the correlation coefficient to a specific range or magnitude, facilitating comparison across different datasets or studies. It ensures consistency in interpretation by standardizing correlation values.

Pearson’s Distance

Pearson’s distance measures the dissimilarity or similarity between two data points based on their correlation coefficient. It quantifies the extent of deviation from perfect correlation, providing insights into the relationship between variables.

Circular Correlation Coefficient

Circular correlation coefficient assesses the relationship between circular variables, such as angles or directions. It accounts for the cyclical nature of data and measures the degree of association between circular datasets.

Partial Correlation

Partial correlation evaluates the relationship between two variables while controlling for the effects of one or more additional variables. It measures the unique association between variables after accounting for the influence of other factors, allowing researchers to isolate specific statistical relationships.

Pearson Correlation Coefficient Interpretation

Pearson correlation coefficient (r) value	Strength	Direction
Greater than .5	Strong	Positive
Between .3 and .5	Moderate	Positive
Between 0 and .3	Weak	Positive
0	None	None
Between 0 and –.3	Weak	Negative
Between –.3 and –.5	Moderate	Negative
Less than –.5	Strong	Negative

Finding the Correlation Coefficient with Pearson Correlation Coefficient Formula

Steps to find the correlation coefficient with Pearson’s correlation coefficient formula:

Step 1: Firstly make a chart with the given data like subject, x, and y and add three more columns in it xy,x² and y².

Step 2: Now multiply the x and y columns to fill the xy column. For example:- in x we have 24 and in y we have 65 so xy will be 24×65=1560.

Step 3: Now, take the square of the numbers in the x column and fill the x² column.

Step 4: Now, take the square of the numbers in the y column and fill the y² column.

Step 5: Now, add up all the values in the columns and put the result at the bottom. Greek letter sigma (Σ) is the short way of saying summation.

Step 6: Now, use the formula for Pearson’s correlation coefficient:-

R = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²

To know which type of variable we have either positive or negative.

Assumptions of Pearson Correlation Coefficient

Linear Relationship: Karl Pearson’s correlation coefficient assumes a linear relationship between the two variables under consideration. It implies that as one variable changes, the other changes proportionally.
Normality: The variables should follow a normal distribution. While Pearson’s correlation coefficient is robust to deviations from normality, extreme departures may affect the validity of the correlation analysis.
Homoscedasticity: This assumption suggests that the variability in one variable should be consistent across all levels of the other variable. In other words, the spread of data points around the regression line should remain constant.
Interval or Ratio Scale: Pearson’s correlation coefficient is appropriate for variables measured on an interval or ratio scale. These scales ensure meaningful numerical distances between observations.
Independence: The observations used to compute the correlation coefficient should be independent of each other. Independence ensures that each data point contributes uniquely to the analysis without being influenced by other observations.

Correlation Coefficient Properties

Correlation Coefficient Range: The correlation coefficient r ranges from -1 to +1, inclusive. A value of -1 indicates a perfect negative linear relationship, +1 denotes a perfect positive linear relationship, and 0 represents no linear relationship.
Directionality: The sign of the correlation coefficient indicates the direction of the relationship between variables. A positive r indicates a positive association (both variables increase or decrease together), while a negative r suggests a negative association (one variable increases as the other decreases).
Magnitude: The magnitude of the correlation coefficient represents the strength of the relationship between variables. Values closer to -1 or +1 indicate a stronger linear relationship, while values closer to 0 suggest a weaker relationship.
No Causation: Pearson’s correlation coefficient does not imply causation between variables. It only measures the degree of linear association and does not establish a cause-and-effect relationship.
Symmetry: The correlation coefficient is symmetric, meaning the correlation between variables X and Y is the same as the correlation between Y and X.
Invariance: The correlation coefficient remains unchanged under linear transformations of the variables (e.g., multiplication by a constant or addition of a constant), making it invariant to changes in scale and location.

Pearson Correlation Coefficient Interpretation

Interpreting the Pearson correlation coefficient (r) involves assessing the correlation strength, direction, and correlation significance of the relationship between two variables. Here’s a guide to interpreting r:

Strength of Relationship:
- Close to +1: Indicates a strong positive linear relationship. As one variable increases, the other tends to increase proportionally.
- Close to -1: Suggests a strong negative linear relationship. As one variable increases, the other tends to decrease proportionally.
- Close to 0: Implies a weak or no linear relationship. Changes in one variable do not consistently predict changes in the other.
Direction of Relationship:
- Positive r: Both variables tend to increase or decrease together.
- Negative r: One variable tends to increase as the other decreases, and vice versa.
Significance:
- Statistical significance indicates whether the observed correlation coefficient is likely to occur due to chance.
- Significance is typically assessed using a hypothesis test, such as the t-test for correlation coefficient, with the null hypothesis stating that the true correlation coefficient in the population is zero.
- If the p-value is less than the chosen significance level (e.g., 0.05), the correlation is considered statistically significant.
Scatterplot Examination:
- Visual inspection of a scatterplot can provide additional insights into the relationship between variables.
- A scatterplot allows you to assess the linearity, directionality, and presence of outliers, complementing the numerical interpretation of r.
Caution:
- Correlation does not imply causation. Even if a strong correlation is observed between two variables, it does not necessarily mean that changes in one variable cause changes in the other.
- Other factors, such as confounding variables or omitted variables, may influence the observed correlation.
Sample Size:
- Larger sample sizes tend to provide more reliable estimates of correlation coefficients, reducing the likelihood of obtaining spurious correlations.
Context Dependence:
- The interpretation of r should consider the specific context and subject matter of the study. What is considered a strong or weak correlation may vary depending on the field of research and the variables under investigation.

Bivariate Correlation

Pearson’s correlation coefficient is a statistical tool used to measure bivariate correlation. This refers to the strength and direction of the linear relationship between two variables. It assesses how much one variable tends to change along with the other. A positive correlation indicates that as one variable increases, the other tends to increase as well. Conversely, a negative correlation suggests that as one variable goes up, the other tends to go down. A value of zero indicates no linear relationship between the variables.

Correlation Matrix

The Pearson correlation coefficient is particularly useful when analyzing datasets with multiple variables. In such cases, a correlation matrix can be constructed. This is a square table that summarizes the correlation coefficients between all possible pairs of variables within the data set. By looking at the correlation matrix, researchers can quickly identify which variables have strong positive, negative, or no linear relationship with each other. This helps them understand the overall structure of the data and identify potential relationships for further investigation.

Pearson Correlation Coefficient Examples

Example 1: There is some correlation coefficient that was given to tell whether the variables are positive or negative?

0.69, 0.42, -0.23, -0.99

Solution:

The given correlation coefficient is as follows:

0.69, 0.42, -0.23, -0.99

Tell whether the relationship is negative or positive

0.69: The relationship between the variables is a strong positive relationship

0.42: The relationship between the variables is a strong positive relationship

-0.23: The relationship between the variables is a weak negative relationship

-0.99: The relationship between the variables is a very strong negative relationship

Example 2: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

Solution:

Given variables are,

X = 10, 13, 15 ,17 ,19

and

Y = 5,10,15,20,25.

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

X Y XY X² Y²

10 5 50 100 25

13 10 130 169 100

15 15 225 225 225

17 20 340 289 400

19 25 475 362 625

∑74 ∑75 ∑1103 ∑1144 ∑1375

∑xy = 1103

∑x = 74

∑y = 75

∑x² = 1144

∑y² = 1375

n = 5

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R = 5(1103) – (74)(75) / √ [5(1144)-(74)²][5(1375)-(75)²]

R = -35 / √[244][1250]

R = -35/552.26

R = 0.0633

The correlation coefficient is 0.064

X	Y	XY	X²	Y²
10	5	50	100	25
13	10	130	169	100
15	15	225	225	225
17	20	340	289	400
19	25	475	362	625
∑74	∑75	∑1103	∑1144	∑1375

Example 3: Calculate the correlation coefficient for the following table with the help of Pearson’s correlation coefficient formula:

SUBJECT	AGE X	Weight Y
1	40	99
2	25	79
3	22	69
4	54	89

Solution:

Make a table from the given data and add three more columns of XY, X², and Y². also add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x², and ∑y² and n =4.

SUBJECT AGE X Weight Y XY X² Y²

1 40 99 3960 1600 9801

2 25 79 1975 625 6241

3 22 69 1518 484 4761

4 54 89 4806 2916 7921

∑ 151 336 12259 5625 28724

∑xy = 12258

∑x = 151

∑y = 336

∑x² = 5625

∑y² = 28724

n = 4

Put all the values in the Pearson’s correlation coefficient formula:-

R = n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R = 4(12258) – (151)(336) / √ [4(5625)-(151)²][4(28724)-(336)²]

R = -1704 / √ [-301][-2000]

R = -1704/775.886

R = -2.1961

The correlation coefficient is -2.196

SUBJECT	AGE X	Weight Y	XY	X²	Y²
1	40	99	3960	1600	9801
2	25	79	1975	625	6241
3	22	69	1518	484	4761
4	54	89	4806	2916	7921
∑	151	336	12259	5625	28724

Example 4: Calculate the correlation coefficient for the following data with the help of Pearson’s correlation coefficient formula:

X = 5 ,9 ,14, 16

and

Y = 6, 10, 16, 20 .

Solution:

Given variables are,

X = 5 ,9 ,14, 16

and

Y = 6, 10, 16, 20 .

To, find the correlation coefficient of the following variables Firstly a table to be constructed as follows, to get the values required in the formula

also, add all the values in the columns to get the values used in the formula.

X Y XY X² Y²

5 6 30 25 36

9 10 90 81 100

14 16 224 196 256

16 20 320 256 400

∑ 44 ∑ 52 ∑ 664 ∑ 558 ∑ 792

∑xy= 664

∑x=44

∑y=52

∑x² =558

∑y² =792

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 4(664) – (44)(52) / √ [4(558)-(44)²][4(792)-(52)²]

R= 368 / √[296][464]

R=368/370.599

R=0.994

The correlation coefficient is 0.994

X	Y	XY	X²	Y²
5	6	30	25	36
9	10	90	81	100
14	16	224	196	256
16	20	320	256	400
∑ 44	∑ 52	∑ 664	∑ 558	∑ 792

Example 5: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

Solution:

Given variables are,

X = 21,31,25,40,47,38

and

Y = 70,55,60,78,66,80

To, find the correlation coefficient of the following variables Firstly a table is to be constructed as follows, to get the values required in the formula also add all the values in the columns to get the values used in the formula.

X Y XY X² Y²

21 70 1470 441 4900

31 55 1705 961 3025

25 60 1500 625 3600

40 78 3120 1600 6084

47 66 3102 2209 4356

38 80 3040 1444 6400

∑202 ∑409 ∑13937 ∑7280 ∑28265

∑xy= 13937

∑x=202

∑y=409

∑x² =7280

∑y² =28265

n =6

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 6(13937) – (202)(409) / √ [6(7280)-(202)²][6(28265)-(409)²]

R= 1004 / √[2876][2909]

R=1004 / 2892.452938

R=-0.3471

The correlation coefficient is -0.3471

X	Y	XY	X²	Y²
21	70	1470	441	4900
31	55	1705	961	3025
25	60	1500	625	3600
40	78	3120	1600	6084
47	66	3102	2209	4356
38	80	3040	1444	6400
∑202	∑409	∑13937	∑7280	∑28265

Example 6: Calculate the correlation coefficient for the following data by the help of Pearson’s correlation coefficient formula:

SUBJECT	Height X	Weight Y
1	43	78
2	24	68
3	26	85
4	35	67

Solution:

Make a table from the given data and add three more columns of XY , X² and Y² and add all the values in the columns to get ∑xy, ∑x, ∑y, ∑x² and ∑y² and n =4.

SUBJECT Height X Weight Y XY X² Y²

1 43 78 3354 1849 6084

2 24 68 1632 567 4624

3 26 85 2210 676 7225

4 35 67 2345 1225 4889

∑ 128 298 9541 4317 22422

∑xy= 9541

∑x=128

∑y=298

∑x² =4317

∑y² 22422

n =4

Put all the values in the Pearson’s correlation coefficient formula:-

R= n(∑xy) – (∑x)(∑y) / √ [n∑x²-(∑x)²][n∑y²-(∑y)²

R= 4(9541) – (128)(298) / √ [4(4317)-(128)²][4(22422)-(298)²]

R= 20 / √ [884][884]

R=20/884

R=0.02262

The correlation coefficient is 0.02262

SUBJECT	Height X	Weight Y	XY	X²	Y²
1	43	78	3354	1849	6084
2	24	68	1632	567	4624
3	26	85	2210	676	7225
4	35	67	2345	1225	4889
∑	128	298	9541	4317	22422

Pearson Correlation Coefficient Practice Problems

1. Given a Pearson correlation coefficient of r = 0.85 between the amount of time students spent studying and their score on a math test, interpret the strength and direction of the relationship.

2. You have data on the number of ice creams sold and the outdoor temperature. After calculating, you find r = −0.62. What does this say about the relationship between the temperature and ice cream sales?

3. Consider the following small dataset representing hours studied (X) and test scores (Y):

Hours Studied (X)	Test Score (Y)
1	50
2	55
3	65
4	70
5	80

Calculate the Pearson correlation coefficient (r) for the data.

Summary – Pearson Correlation Coefficient

The Pearson Correlation Coefficient, symbolized as r, is a statistical tool used to measure the strength and direction of a linear relationship between two variables on a scatterplot. Its values range from -1 to 1, where 1 indicates a perfect positive linear relationship, -1 indicates a perfect negative linear relationship, and 0 suggests no linear relationship at all. The formula to calculate r involves summing the products of paired scores, then dividing by the square root of the product of the sums of squared scores. This coefficient not only quantifies the degree of linear association between two variables but also highlights the presence of any linear correlation through its magnitude (absolute value) and direction (positive or negative). However, it’s crucial to remember that a high or low Pearson coefficient doesn’t imply causation but merely indicates how strongly two variables are related in a linear manner. The origins of this coefficient trace back to Francis Galton’s work on inheritance, with significant contributions from French physicist Auguste Bravais and statistician Karl Pearson, who popularized its use. The Pearson Correlation Coefficient is foundational in fields ranging from psychology to economics, aiding in the interpretation and analysis of the linear relationship between variables, under the assumption that these relationships are linear, the data is normally distributed, homoscedastic, and measured on an interval or ratio scale, with each observation being independent.

FAQs on Pearson Correlation Coefficient

What is Karl Pearson’s coefficient of correlation?

Karl Pearson’s coefficient of correlation, commonly known as the Pearson correlation coefficient (r), is a statistical measure that quantifies the strength and direction of the linear relationship between two continuous variables. Correlation coefficient ranges from -1 to +1, where -1 indicates a perfect negative linear relationship, +1 indicates a perfect positive linear relationship, and 0 suggests no linear relationship.

What is the Pearson correlation coefficient?

The Pearson correlation coefficient (r) is a measure of the strength and direction of the linear relationship between two variables. It is calculated by dividing the covariance of the two variables by the product of their standard deviations.

What is the formula for Pearson Correlation Coefficient?

r = n(∑xy) – (∑x)(∑y) / √[n∑x²-(∑x)²][n∑y²-(∑y)²

Why do we use the Pearson correlation coefficient?

The Pearson correlation coefficient is used to assess the strength and direction of the linear relationship between two variables. It helps researchers and analysts understand how changes in one variable correspond to changes in another, aiding in hypothesis testing, model building, and making predictions in various fields such as psychology, economics, biology, and social sciences.

What does Pearson’s correlation coefficient tell you?

Pearson’s correlation coefficient quantifies the strength and direction of the linear relationship between two variables. It tells us whether the variables move together (positive correlation), move in opposite directions (negative correlation), or have no discernible pattern of movement (zero correlation).

What is the difference between r² and Pearson correlation?

The Pearson correlation coefficient ( r) measures the strength and direction of the linear relationship between two variables, while r² (the coefficient of determination) represents the proportion of variance in one variable that is predictable from the other variable in a linear regression model. In essence, r² is the square of the Pearson correlation coefficient and provides a measure of the goodness of fit of a linear regression model.

What is a good correlation coefficient?

A good correlation coefficient depends on the context and the specific field of study. Generally, a correlation coefficient close to +1 or -1 indicates a strong linear relationship between variables, while a coefficient close to 0 suggests a weak or no linear relationship. However, what constitutes a “good” correlation varies depending on the research question, field of study, and practical implications.

What does a correlation coefficient of 0.5 mean?

A correlation coefficient of 0.5 indicates a moderate positive linear relationship between two variables. It suggests that as one variable increases, the other tends to increase as well, but the relationship is not perfect.

What does a 0.2 correlation mean?

A correlation coefficient of 0.2 suggests a weak positive linear relationship between two variables. While there is some tendency for the variables to move together, the relationship is relatively weak and may not be practically significant without further context.

Is a correlation coefficient of 0.4 strong?

A correlation coefficient of 0.4 indicates a moderate positive linear relationship between two variables. While not as strong as coefficients closer to +1, a value of 0.4 still suggests a discernible pattern of association between the variables, which may be meaningful depending on the context of the study.

Suggest improvement

Pearson Correlation Coefficient

Share your thoughts in the comments

Pearson Correlation Coefficient

What is the Pearson Correlation Coefficient?

Pearson’s Correlation Coefficient Formula

Pearson Correlation Coefficient Table

Pearson Correlation Coefficient Origin