In this article, we will discuss how to calculate Correlation between Multiple variables in R Programming Language. Correlation is used to get the relation between two or more variables:
- The result is 0 if there is no correlation between two variables
- The result is 1 if there is a positive correlation between two variables
- The result is -1 if there is a negative correlation between two variables
Let’s create an initial dataframe:
# create the dataframe with 4 columns data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),col4= c (1:10))
# display data |
Output:
col1 col2 col3 col4 1 1 11 21 1 2 2 12 22 2 3 3 13 23 3 4 4 14 24 4 5 5 15 25 5 6 6 16 26 6 7 7 17 27 7 8 8 18 28 8 9 9 19 29 9 10 10 20 30 10
Method 1: Correlation Between Two Variables
In this method to calculate the correlation between two variables, the user has to simply call the corr() function from the base R, passed with the required parameters which will be the name of the variables whose correlation is needed to be calculated and further this will be returning the correlation detail between the given two variables in the R programming language.
Syntax:
cor(dataframe$column1, dataframe$column1)
where,
- dataframe is the input dataframe
- column1 is the column1 correlated with column2
Example:
Here, in this example, we are going to create the dataframe with 4 columns with 10 rows and find the correlation between col1 and col2,correlation between col1 and col3,correlation between col1 and col4 and correlation between col3 and col4 using the cor() function in the R programming language.
# create the dataframe with 4 columns data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),col4= c (1:10))
# correlation between col1 and col2 print ( cor (data$col1,data$col2))
# correlation between col1 and col3 print ( cor (data$col1,data$col3))
# correlation between col1 and col4 print ( cor (data$col1,data$col4))
# correlation between col3 and col4 print ( cor (data$col3,data$col4))
|
Output:
1 1 1 1
Method 2: Correlation Between Multiple Variables
In this method, the user has to call the cor() function and then within this function the user has to pass the name of the multiple variables in the form of vector as its parameter to get the correlation among multiple variables by specifying multiple column names in the R programming language.
Syntax:
cor(dataframe[, c('column1','column2',.,'column n')])
Example:
In this example, we will find the correlation between using cor() function of col1,col3, and col2,col1,col4 and col2, and col2,col3, and col4 in the R programming language.
# create the dataframe with 4 columns data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),
col4= c (1:5,34,56,32,23,45))
# correlation between col1,col3 and col2 print ( cor (data[, c ( 'col1' , 'col3' , 'col2' )]))
# correlation between col1,col4 and col2 print ( cor (data[, c ( 'col1' , 'col4' , 'col2' )]))
# correlation between col2,col3 and col4 print ( cor (data[, c ( 'col2' , 'col3' , 'col4' )]))
|
Output:
col1 col3 col2 col1 1 1 1 col3 1 1 1 col2 1 1 1 col1 col4 col2 col1 1.000000 0.787662 1.000000 col4 0.787662 1.000000 0.787662 col2 1.000000 0.787662 1.000000 col2 col3 col4 col2 1.000000 1.000000 0.787662 col3 1.000000 1.000000 0.787662 col4 0.787662 0.787662 1.000000
Method 3: Correlation between all variables
In this method to compute the correlation between all the variables in the given data frame, the user needs to call the cor() function with the entire data frame passed as its parameter to get the correlation between all variables of the given data frame in the R programming language.
Syntax:
cor(dataframe)
Example:
In this example, we are going to find the correlation between all the columns of the given data frame in the R programming language.
# create the dataframe with 4 columns data= data.frame (col1= c (1:10),col2= c (11:20),
col3= c (21:30),
col4= c (1:5,34,56,32,23,45))
# correlation in entire dataframe print ( cor (data))
|
Output:
col1 col2 col3 col4 col1 1.000000 1.000000 1.000000 0.787662 col2 1.000000 1.000000 1.000000 0.787662 col3 1.000000 1.000000 1.000000 0.787662 col4 0.787662 0.787662 0.787662 1.0000