Skip to content
Related Articles

Related Articles

Correlation Matrix in R Programming

Improve Article
Save Article
  • Last Updated : 12 Jan, 2022
Improve Article
Save Article

Correlation refers to the relationship between two variables. It refers to the degree of linear correlation between any two random variables. This relation can be expressed as a range of values expressed within the interval [-1, 1]. The value -1 indicates a perfect non-linear (negative) relationship, 1 is a perfect positive linear relationship and 0 is an intermediate between neither positive nor negative linear interdependency. However, a value of 0 doesn’t indicate the variables to be independent of each other completely. Correlation Matrices compute the linear relationship degree between a set of random variables, taking one pair at a time and performing for each set of pairs within the data.

Properties of Correlation Matrices

  • All the diagonal elements of the correlation matrix must be 1 because the correlation of a variable with itself is always perfect, cii=1.
  • It should be symmetric cij=cji.

Computing Correlation Matrix in R

In R programming, a correlation matrix can be completed using the cor( ) function, which has the following syntax:

 Syntax: cor (x, use = , method =    )

Parameters:

x: It is a numeric matrix or a data frame.
use: Deals with missing data.

  • all.obs: this parameter value assumes that the data frame has no missing values and throws an error in case of violation.
  • complete.obs: listwise deletion.
  • pairwise.complete.obs: pairwise deletion.

method: Deals with a type of relationship. Either Pearson, Spearman, or Kendall can be used for computation. The default method used is Pearson. 

Correlation in R Programming Language

The correlation matrix can be computed in R after loading the data. The following code snippet indicates the usage of the cor() function: 

R




# loading dataset from the specified url
# storing the data into csv
data = read.csv("https://people.sc.fsu.edu/~jburkardt/data/csv/ford_escort.csv",
                header = TRUE, fileEncoding = "latin1")
 
# printing the head of the data
print ("Original Data")
head(data)
 
# computing correlation matrix
cor_data = cor(data)
 
print("Correlation matrix")
print(cor_data)

 Output:

[1] "Original Data"
  Year Mileage..thousands. Price
1 1998                  27  9991
2 1997                  17  9925
3 1998                  28 10491
4 1998                   5 10990
5 1997                  38  9493
6 1997                  36  9991

[1] "Correlation matrix"
                         Year Mileage..thousands.      Price
Year                 1.0000000          -0.7480982  0.9343679
Mileage..thousands. -0.7480982           1.0000000 -0.8113807
Price                0.9343679          -0.8113807  1.0000000

Computing Correlation Coefficients

R contains an in-built function rcorr() which generates the correlation coefficients and a table of p-values for all possible column pairs of a data frame. This function basically computes the significance levels for Pearson and spearman correlations.

Syntax: rcorr (x, type = c(“pearson”, “spearman”))

In order to run this function in R, we need to download and load the “Hmisc” package into the environment. This can be done in the following way: 

install.packages(“Hmisc”) 

library(“Hmisc”)

The following code snippet indicates the computation of correlation coefficients in R:

R




data = read.csv("https://people.sc.fsu.edu/~jburkardt/data/csv/ford_escort.csv",
                header = TRUE, fileEncoding = "latin1")
 
# printing the head of the data
print("Original Data")
head(data)
 
# installing the library of Hmisc
install.packages("Hmisc")
library("Hmisc")
 
# computing p values of the data loaded
p_values <- rcorr(as.matrix(data))
print(p_values)

 Output:

[1] "Original Data"
Year Mileage..thousands. Price
1 1998                  27  9991
2 1997                  17  9925
3 1998                  28 10491
4 1998                   5 10990
5 1997                  38  9493
6 1997                  36  9991

Year Mileage..thousands. Price
Year                 1.00               -0.75  0.93
Mileage..thousands. -0.75                1.00 -0.81
Price                0.93               -0.81  1.00

n= 23 


P
                    Year Mileage..thousands. Price
Year                      0                   0   
Mileage..thousands.  0                        0   
Price                0    0                       

Visualize a Correlation Matrix

In R, we shall use the “corrplot” package to implement a correlogram. Hence, to install the package from the R Console we should execute the following command:

install.packages("corrplot")

Once we have installed the package properly, we shall load the package in our R script using the library() function as follows:  

library("corrplot")

We will use the corrplot() function and mention the shape in its method arguments.

R




# Correlogram in R
# required packages
library(corrplot)
 
head(mtcars)
# correlation matrix
M<-cor(mtcars)
head(round(M,2))
 
# visualizing correlogram
# as circle
corrplot(M, method="circle")
 
# as pie
corrplot(M, method="pie")
 
# as colour
corrplot(M, method="color")
 
# as number
corrplot(M, method="number")

Output:


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!