Open In App

How to Calculate Correlation Between Multiple Variables in R?

Last Updated : 19 Dec, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to calculate Correlation between Multiple variables in R Programming Language. Correlation is used to get the relation between two or more variables:

  • The result is 0 if there is no correlation between two variables
  • The result is 1 if there is a positive correlation between two variables
  • The result is -1 if there is a negative  correlation between two variables

Let’s create an initial dataframe:

R




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),col4=c(1:10))
  
# display
data


Output:

   col1 col2 col3 col4
1     1   11   21    1
2     2   12   22    2
3     3   13   23    3
4     4   14   24    4
5     5   15   25    5
6     6   16   26    6
7     7   17   27    7
8     8   18   28    8
9     9   19   29    9
10   10   20   30   10

Method 1: Correlation Between Two Variables

In this method to calculate the correlation between two variables, the user has to simply call the corr() function from the base R, passed with the required parameters which will be the name of the variables whose correlation is needed to be calculated and further this will be returning the correlation detail between the given two variables in the R programming language.

Syntax:

cor(dataframe$column1, dataframe$column1)

where,

  • dataframe is the input dataframe
  • column1 is the column1 correlated with column2

Example:

Here, in this example, we are going to create the dataframe with 4 columns with 10 rows and find the correlation between col1 and col2,correlation between col1 and col3,correlation between col1 and col4 and correlation between col3 and col4 using the cor() function in the R programming language.

R




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),col4=c(1:10))
  
# correlation between col1 and col2
print(cor(data$col1,data$col2))
  
# correlation between col1 and col3
print(cor(data$col1,data$col3))
  
# correlation between col1 and col4
print(cor(data$col1,data$col4))
  
# correlation between col3 and col4
print(cor(data$col3,data$col4))


Output:

1
1
1
1

Method 2: Correlation Between Multiple Variables

In this method, the user has to call the cor() function and then within this function the user has to pass the name of the multiple variables in the form of vector as its parameter to get the correlation among multiple variables by specifying multiple column names in the R programming language.

Syntax:

cor(dataframe[, c('column1','column2',.,'column n')])

Example:

In this example, we will find the correlation between using cor() function of col1,col3, and col2,col1,col4 and col2, and col2,col3, and col4 in the R programming language. 

R




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),
                col4=c(1:5,34,56,32,23,45))
  
# correlation between col1,col3 and col2
print(cor(data[, c('col1','col3','col2')]))
  
# correlation between col1,col4 and col2
print(cor(data[, c('col1','col4','col2')]))
  
# correlation between col2,col3 and col4
print(cor(data[, c('col2','col3','col4')]))


Output:

     col1 col3 col2
col1    1    1    1
col3    1    1    1
col2    1    1    1

         col1     col4     col2
col1 1.000000 0.787662 1.000000
col4 0.787662 1.000000 0.787662
col2 1.000000 0.787662 1.000000

         col2     col3     col4
col2 1.000000 1.000000 0.787662
col3 1.000000 1.000000 0.787662
col4 0.787662 0.787662 1.000000

Method 3: Correlation between all variables

In this method to compute the correlation between all the variables in the given data frame, the user needs to call the cor() function with the entire data frame passed as its parameter to get the correlation between all variables of the given data frame in the R programming language.

Syntax:

cor(dataframe)

Example:

In this example, we are going to find the correlation between all the columns of the given data frame in the R programming language.

R




# create the dataframe with 4 columns
data=data.frame(col1=c(1:10),col2=c(11:20),
                col3=c(21:30),
                col4=c(1:5,34,56,32,23,45))
  
# correlation in entire dataframe
print(cor(data))


Output:

         col1     col2     col3     col4
col1 1.000000 1.000000 1.000000 0.787662
col2 1.000000 1.000000 1.000000 0.787662
col3 1.000000 1.000000 1.000000 0.787662
col4 0.787662 0.787662 0.787662 1.0000


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads