Open In App

Covariance and Correlation in R Programming

Last Updated : 05 Jul, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Covariance and Correlation are terms used in statistics to measure relationships between two random variables. Both of these terms measure linear dependency between a pair of random variables or bivariate data. They both capture a different component of the relationship, despite the fact that they both provide information about the link between variables. Let’s investigate the theory underlying correlation and covariance:

We can discuss some of the main difference between them as below:In this article, we are going to discuss cov(), cor() and cov2cor() functions in R which use covariance and correlation methods of statistics and probability theory.

Covariance in R Programming Language

In R programming, covariance can be measured using the cov() function. Covariance is a statistical term used to measure the direction of the linear relationship between the data vectors. Mathematically, 
\operatorname{Cov}(x, y)=\frac{\Sigma\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{N}

where, 

x represents the x data vector 
y represents the y data vector 
\bar{x}    represents mean of x data vector 
\bar{y}    represents mean of y data vector 
N represents total observations

Covariance Syntax in R

Syntax: cov(x, y, method)

where, 

  • x and y represents the data vectors
  • method defines the type of method to be used to compute covariance. Default is “pearson”.

Example: 

R

# Data vectors
x <- c(1, 3, 5, 10)
 
y <- c(2, 4, 6, 20)
 
# Print covariance using different methods
print(cov(x, y))
print(cov(x, y, method = "pearson"))
print(cov(x, y, method = "kendall"))
print(cov(x, y, method = "spearman"))

                    

Output: 

[1] 30.66667
[1] 30.66667
[1] 12
[1] 1.666667

Correlation in R Programming Language

cor() function in R programming measures the correlation coefficient value. Correlation is a relationship term in statistics that uses the covariance method to measure how strongly the vectors are related. Mathematically,
\operatorname{Corr}(x, y)=\frac{\sum\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sqrt{\sum\left(x_{i}-\bar{x}\right)^{2} \sum\left(y_{i}-\bar{y}\right)^{2}}}

where, 

x represents the x data vector 
y represents the y data vector 
     [Tex]\bar{x}    [/Tex]represents mean of x data vector 
     [Tex]\bar{y}    [/Tex]represents mean of y data vector

Correlation in R

Syntax: cor(x, y, method)

where, 

  • x and y represents the data vectors
  • method defines the type of method to be used to compute covariance. Default is “pearson”.

Example: 

R

# Data vectors
x <- c(1, 3, 5, 10)
 
y <- c(2, 4, 6, 20)
 
# Print correlation using different methods
print(cor(x, y))
 
print(cor(x, y, method = "pearson"))
print(cor(x, y, method = "kendall"))
print(cor(x, y, method = "spearman"))

                    

Output: 

[1] 0.9724702
[1] 0.9724702
[1] 1
[1] 1

Covariance and Correlation For data frame

We cancalculate the covariance and correlation for all columns in data frame.

R

data(iris)
library(dplyr)
   
# remove Species column
data=select(iris,-Species)
 
# calculate corelation
cor(data)
 
# calculate covariance
cov(data)

                    

Output:

> cor(data)
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

> cov(data)
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    0.6856935  -0.0424340    1.2743154   0.5162707
Sepal.Width    -0.0424340   0.1899794   -0.3296564  -0.1216394
Petal.Length    1.2743154  -0.3296564    3.1162779   1.2956094
Petal.Width     0.5162707  -0.1216394    1.2956094   0.5810063

Conversion of Covariance to Correlation in R

cov2cor() function in R programming converts a covariance matrix into a corresponding correlation matrix.

Syntax: cov2cor(X)

where, 

  • X and y represents the covariance square matrix

Example: 

R

# Data vectors
x <- rnorm(2)
y <- rnorm(2)
 
# Binding into square matrix
mat <- cbind(x, y)
 
# Defining X as the covariance matrix
X <- cov(mat)
 
# Print covariance matrix
print(X)
 
# Print correlation matrix of data
# vector
print(cor(mat))
 
# Using function cov2cor()
# To convert covariance matrix to
# correlation matrix
print(cov2cor(X))

                    

Output: 

           x          y
x  0.0742700 -0.1268199
y -0.1268199  0.2165516

   x  y
x  1 -1
y -1  1

   x  y
x  1 -1
y -1  1

Difference between Covariance and Correlation

We can discuss some of the main difference between them as below:

Covariance

Correlation

Covariance quantifies the interdependence of two variables. It measures the strength of the correlation between changes in one variable and changes in another.Dividing the covariance by the sum of the standard deviations of the variables, it standardises the covariance.
Covariance: Covariance is not scaled, and the units used to quantify the variables affect how much it is worth. Comparing covariances across many datasets or variables is therefore challenging.Correlation is a standardised measurement that has a range of -1 to 1. It enables meaningful comparisons between several datasets or variables and is independent of the magnitude of the variables.
The scales of the variables have an impact on covariance, which is not standardised. As a result, comparing the size of covariances across several datasets or variables is challenging.By dividing the covariance by the sum of the standard deviations of the variables, correlation standardises the covariance. This enables for meaningful interpretation of the strength and direction of the association and makes correlation values comparable.
Understanding the combined variability of two variables and their potential link is helped by covariance. It is frequently used in statistical models, risk analysis, and portfolio analysis.Correlation is frequently used to assess how strongly two variables are linearly related. It frequently appears in data analysis, regression models, prediction models, and multicollinearity assessments.

Correlation describes the intensity and direction of the linear link between two variables, whereas covariance shows how much two variables vary together.
 



Previous Article
Next Article

Similar Reads

Compute the Covariance between Two Vectors in R Programming - cov() Function
cov() function in R Language is used to measure the covariance between two vectors. Syntax: cov(x, y, method) Parameters: x, y: Data vectors method: Type of method to be used Example 1: # Data vectors x &lt;- c(1, 3, 5, 10) y &lt;- c(2, 4, 6, 20) # Print covariance using Pearson method print(cov(x, y, method = &quot;pearson&quot;)) Output: [1] 30.6
1 min read
How to Create a Covariance Matrix in R?
In this article, we will discuss how to create a Covariance Matrix in the R Programming Language. Covariance is the statistical measure that depicts the relationship between a pair of random variables that shows how the change in one variable causes changes in another variable. It is a measure of the degree to which two variables are linearly assoc
2 min read
Compute the Correlation Coefficient Value between Two Vectors in R Programming - cor() Function
cor() function in R Language is used to measure the correlation coefficient value between two vectors. Syntax: cor(x, y, method) Parameters: x, y: Data vectors method: Type of method to be used Example 1: # Data vectors x &lt;- c(1, 3, 5, 10) y &lt;- c(2, 4, 6, 20) # Print covariance using Pearson method print(cor(x, y, method = &quot;pearson&quot;
1 min read
Spearman Correlation Testing in R Programming
The strength of the association between two variables is known as the correlation test. For instance, if one is interested to know whether there is a relationship between the weights of mothers and daughters, a correlation coefficient can be calculated to answer this question. To know more about correlation please refer Correlation. Methods for Cor
4 min read
Kendall Correlation Testing in R Programming
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generall
4 min read
Visualize correlation matrix using correlogram in R Programming
A graph of the correlation matrix is known as Correlogram. This is generally used to highlight the variables in a data set or data table that are correlated most. The correlation coefficients in the plot are colored based on the value. Based on the degree of association among the variables, we can reorder the correlation matrix accordingly. Correlo
7 min read
Visualize Correlation Matrix using symnum function in R Programming
Correlation refers to the relationship between two variables. It refers to the degree of linear correlation between any two random variables. This relation can be expressed as a range of values expressed within the interval [-1, 1]. The value -1 indicates a perfect non-linear (negative) relationship, 1 is a perfect positive linear relationship and
6 min read
Correlation Matrix in R Programming
Correlation refers to the relationship between two variables. It refers to the degree of linear correlation between any two random variables. This Correlation Matrix in R can be expressed as a range of values expressed within the interval [-1, 1]. The value -1 indicates a perfect non-linear (negative) relationship, 1 is a perfect positive linear re
4 min read
Pearson Correlation Testing in R Programming
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient can be calculated to answer this question. Generall
5 min read
Add Correlation Coefficients with P-values to a Scatter Plot in R
In this article, we will discuss how to add correlation coefficients with P-value to a scatter plot in the R Programming Language. To add correlation coefficient with P-value to a scatter plot, we use the stat_cor() function of the ggpubr package in the R Language. The ggpubr package provides some easy-to-use functions for creating and customizing
2 min read