Calculate Correlation Matrix Only for Numeric Columns in R
A correlation matrix is a tabular representation of the relation between numeric attributes of a dataframe. The values present in the table are correlation coefficients between the attributes.
Dataset used: bestsellers
To create a correlation matrix cor() function is called with the dataframe as an argument.
Error in cor(df) : 'x' must be numeric
This function fails when the dataframe consists of values apart from numeric values. Creating a correlation matrix in such a situation can be done by any of the methods given below.
Method 1: Using sapply()
Here cor() is called as above in the dataframe but this time only numeric columns are given to it. To filter out numeric columns, an operation to check numeric values is applied on the data frame using sapply().
sapply() function in R Language takes to list, vector, or data frame as input and gives output in a vector or matrix. It is useful for operations on list objects and returns a list object of the same length as the original set.
Syntax: sapply(X, FUN)
- X: A vector or an object
- FUN: Function applied to each element of x
Method 2: Using lapply()
Similarly, lapply can also be applied to filter out the numeric values. lapply() function R Language is used to apply a function over a list of elements.
Syntax: lapply(list, func)
- list: list of elements
- func: operation to be applied
After the required data is selected, the list is converted to a vector using unlist(), and then this dataframe is passed to cor() to produce a correlation matrix.
unlist() function in R Language is used to convert a list to a vector. It simplifies to produce a vector by preserving all components.
list: It is a list or Vector
use.name: Boolean value to prserve or not the position names