Open In App

How to Calculate Correlation Between Multiple Variables in R?

In this article, we will discuss how to calculate Correlation between Multiple variables in R Programming Language. Correlation is used to get the relation between two or more variables:

Let’s create an initial dataframe:






# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),col4=c(1:10))
  
# display
data

Output:

   col1 col2 col3 col4
1     1   11   21    1
2     2   12   22    2
3     3   13   23    3
4     4   14   24    4
5     5   15   25    5
6     6   16   26    6
7     7   17   27    7
8     8   18   28    8
9     9   19   29    9
10   10   20   30   10

Method 1: Correlation Between Two Variables

In this method to calculate the correlation between two variables, the user has to simply call the corr() function from the base R, passed with the required parameters which will be the name of the variables whose correlation is needed to be calculated and further this will be returning the correlation detail between the given two variables in the R programming language.



Syntax:

cor(dataframe$column1, dataframe$column1)

where,

  • dataframe is the input dataframe
  • column1 is the column1 correlated with column2

Example:

Here, in this example, we are going to create the dataframe with 4 columns with 10 rows and find the correlation between col1 and col2,correlation between col1 and col3,correlation between col1 and col4 and correlation between col3 and col4 using the cor() function in the R programming language.




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),col4=c(1:10))
  
# correlation between col1 and col2
print(cor(data$col1,data$col2))
  
# correlation between col1 and col3
print(cor(data$col1,data$col3))
  
# correlation between col1 and col4
print(cor(data$col1,data$col4))
  
# correlation between col3 and col4
print(cor(data$col3,data$col4))

Output:

1
1
1
1

Method 2: Correlation Between Multiple Variables

In this method, the user has to call the cor() function and then within this function the user has to pass the name of the multiple variables in the form of vector as its parameter to get the correlation among multiple variables by specifying multiple column names in the R programming language.

Syntax:

cor(dataframe[, c('column1','column2',.,'column n')])

Example:

In this example, we will find the correlation between using cor() function of col1,col3, and col2,col1,col4 and col2, and col2,col3, and col4 in the R programming language. 




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),
                col4=c(1:5,34,56,32,23,45))
  
# correlation between col1,col3 and col2
print(cor(data[, c('col1','col3','col2')]))
  
# correlation between col1,col4 and col2
print(cor(data[, c('col1','col4','col2')]))
  
# correlation between col2,col3 and col4
print(cor(data[, c('col2','col3','col4')]))

Output:

     col1 col3 col2
col1    1    1    1
col3    1    1    1
col2    1    1    1

         col1     col4     col2
col1 1.000000 0.787662 1.000000
col4 0.787662 1.000000 0.787662
col2 1.000000 0.787662 1.000000

         col2     col3     col4
col2 1.000000 1.000000 0.787662
col3 1.000000 1.000000 0.787662
col4 0.787662 0.787662 1.000000

Method 3: Correlation between all variables

In this method to compute the correlation between all the variables in the given data frame, the user needs to call the cor() function with the entire data frame passed as its parameter to get the correlation between all variables of the given data frame in the R programming language.

Syntax:

cor(dataframe)

Example:

In this example, we are going to find the correlation between all the columns of the given data frame in the R programming language.




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),
                col4=c(1:5,34,56,32,23,45))
  
# correlation in entire dataframe
print(cor(data))

Output:

         col1     col2     col3     col4
col1 1.000000 1.000000 1.000000 0.787662
col2 1.000000 1.000000 1.000000 0.787662
col3 1.000000 1.000000 1.000000 0.787662
col4 0.787662 0.787662 0.787662 1.0000

Article Tags :