How to Calculate Correlation Between Multiple Variables in R?
Last Updated :
19 Dec, 2021
In this article, we will discuss how to calculate Correlation between Multiple variables in R Programming Language. Correlation is used to get the relation between two or more variables:
- The result is 0 if there is no correlation between two variables
- The result is 1 if there is a positive correlation between two variables
- The result is -1 if there is a negative correlation between two variables
Let’s create an initial dataframe:
R
data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),col4= c (1:10))
data
|
Output:
col1 col2 col3 col4
1 1 11 21 1
2 2 12 22 2
3 3 13 23 3
4 4 14 24 4
5 5 15 25 5
6 6 16 26 6
7 7 17 27 7
8 8 18 28 8
9 9 19 29 9
10 10 20 30 10
Method 1: Correlation Between Two Variables
In this method to calculate the correlation between two variables, the user has to simply call the corr() function from the base R, passed with the required parameters which will be the name of the variables whose correlation is needed to be calculated and further this will be returning the correlation detail between the given two variables in the R programming language.
Syntax:
cor(dataframe$column1, dataframe$column1)
where,
- dataframe is the input dataframe
- column1 is the column1 correlated with column2
Example:
Here, in this example, we are going to create the dataframe with 4 columns with 10 rows and find the correlation between col1 and col2,correlation between col1 and col3,correlation between col1 and col4 and correlation between col3 and col4 using the cor() function in the R programming language.
R
data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),col4= c (1:10))
print ( cor (data$col1,data$col2))
print ( cor (data$col1,data$col3))
print ( cor (data$col1,data$col4))
print ( cor (data$col3,data$col4))
|
Output:
1
1
1
1
Method 2: Correlation Between Multiple Variables
In this method, the user has to call the cor() function and then within this function the user has to pass the name of the multiple variables in the form of vector as its parameter to get the correlation among multiple variables by specifying multiple column names in the R programming language.
Syntax:
cor(dataframe[, c('column1','column2',.,'column n')])
Example:
In this example, we will find the correlation between using cor() function of col1,col3, and col2,col1,col4 and col2, and col2,col3, and col4 in the R programming language.
R
data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),
col4= c (1:5,34,56,32,23,45))
print ( cor (data[, c ( 'col1' , 'col3' , 'col2' )]))
print ( cor (data[, c ( 'col1' , 'col4' , 'col2' )]))
print ( cor (data[, c ( 'col2' , 'col3' , 'col4' )]))
|
Output:
col1 col3 col2
col1 1 1 1
col3 1 1 1
col2 1 1 1
col1 col4 col2
col1 1.000000 0.787662 1.000000
col4 0.787662 1.000000 0.787662
col2 1.000000 0.787662 1.000000
col2 col3 col4
col2 1.000000 1.000000 0.787662
col3 1.000000 1.000000 0.787662
col4 0.787662 0.787662 1.000000
Method 3: Correlation between all variables
In this method to compute the correlation between all the variables in the given data frame, the user needs to call the cor() function with the entire data frame passed as its parameter to get the correlation between all variables of the given data frame in the R programming language.
Syntax:
cor(dataframe)
Example:
In this example, we are going to find the correlation between all the columns of the given data frame in the R programming language.
R
data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),
col4= c (1:5,34,56,32,23,45))
print ( cor (data))
|
Output:
col1 col2 col3 col4
col1 1.000000 1.000000 1.000000 0.787662
col2 1.000000 1.000000 1.000000 0.787662
col3 1.000000 1.000000 1.000000 0.787662
col4 0.787662 0.787662 0.787662 1.0000
Share your thoughts in the comments
Please Login to comment...