# Covariance and Correlation in R Programming

Covariance and Correlation are terms used in statistics to measure relationships between two random variables. Both of these terms measure linear dependency between a pair of random variables or bivariate data. They both capture a different component of the relationship, despite the fact that they both provide information about the link between variables. Let’s investigate the theory underlying correlation and covariance:

We can discuss some of the main difference between them as below:In this article, we are going to discuss cov(), cor() and cov2cor() functions in R which use covariance and correlation methods of statistics and probability theory.

## Covariance in R Programming Language

In R programming, covariance can be measured using the cov() function. Covariance is a statistical term used to measure the direction of the linear relationship between the data vectors. Mathematically,

where,

x represents the x data vector
y represents the y data vector
represents mean of x data vector
represents mean of y data vector
N represents total observations

### Covariance Syntax in R

Syntax: cov(x, y, method)

where,

• x and y represents the data vectors
• method defines the type of method to be used to compute covariance. Default is “pearson”.

Example:

## R

 # Data vectors x <- c(1, 3, 5, 10)   y <- c(2, 4, 6, 20)   # Print covariance using different methods print(cov(x, y)) print(cov(x, y, method = "pearson")) print(cov(x, y, method = "kendall")) print(cov(x, y, method = "spearman"))

Output:

[1] 30.66667
[1] 30.66667
[1] 12
[1] 1.666667

## Correlation in R Programming Language

cor() function in R programming measures the correlation coefficient value. Correlation is a relationship term in statistics that uses the covariance method to measure how strongly the vectors are related. Mathematically,

where,

x represents the x data vector
y represents the y data vector
[Tex]\bar{x}    [/Tex]represents mean of x data vector
[Tex]\bar{y}    [/Tex]represents mean of y data vector

### Correlation in R

Syntax: cor(x, y, method)

where,

• x and y represents the data vectors
• method defines the type of method to be used to compute covariance. Default is “pearson”.

Example:

## R

 # Data vectors x <- c(1, 3, 5, 10)   y <- c(2, 4, 6, 20)   # Print correlation using different methods print(cor(x, y))   print(cor(x, y, method = "pearson")) print(cor(x, y, method = "kendall")) print(cor(x, y, method = "spearman"))

Output:

[1] 0.9724702
[1] 0.9724702
[1] 1
[1] 1

## Covariance and Correlation For data frame

We cancalculate the covariance and correlation for all columns in data frame.

## R

 data(iris) library(dplyr)     # remove Species column data=select(iris,-Species)   # calculate corelation cor(data)   # calculate covariance cov(data)

Output:

> cor(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

> cov(data)
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    0.6856935  -0.0424340    1.2743154   0.5162707
Sepal.Width    -0.0424340   0.1899794   -0.3296564  -0.1216394
Petal.Length    1.2743154  -0.3296564    3.1162779   1.2956094
Petal.Width     0.5162707  -0.1216394    1.2956094   0.5810063

## Conversion of Covariance to Correlation in R

cov2cor() function in R programming converts a covariance matrix into a corresponding correlation matrix.

Syntax: cov2cor(X)

where,

• X and y represents the covariance square matrix

Example:

## R

 # Data vectors x <- rnorm(2) y <- rnorm(2)   # Binding into square matrix mat <- cbind(x, y)   # Defining X as the covariance matrix X <- cov(mat)   # Print covariance matrix print(X)   # Print correlation matrix of data  # vector print(cor(mat))   # Using function cov2cor() # To convert covariance matrix to  # correlation matrix print(cov2cor(X))

Output:

           x          y
x  0.0742700 -0.1268199
y -0.1268199  0.2165516

x  y
x  1 -1
y -1  1

x  y
x  1 -1
y -1  1

## Difference between Covariance and Correlation

We can discuss some of the main difference between them as below:

Correlation describes the intensity and direction of the linear link between two variables, whereas covariance shows how much two variables vary together.

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Previous
Next