# Convert covariance matrix to correlation matrix using Python

In this article, we will be discussing the relationship between Covariance and Correlation and program our own function for calculating covariance and correlation using python.

**Covariance: **

It tells us how two quantities are related to one another say we want to calculate the covariance between x and y the then the outcome can be one of these.

where

are the means of *x* and *y* respectively.

Interpreting the output:

Either the covariance between x and y is :

Covarience(x,y) > 0 : this means that they are positively related

Covarience(x,y) < 0 : this means that x and y are negatively related

if Covarience(x,y) = 0 : then x and y are independent of each other.

**Covariance matrix: **

Covariance provides a measure of strength of correlation between two variable or more set of variables, to calculate the covariance matrix, the *cov()* method in *numpy* is used..

**Syntax: **

ny.cov(m, y=None, rowvar=True, bias=False, ddof=None, fweights=None, aweights=None)

m :[array_like] A 1D or 2D variables. variables are columns

y :[array_like] It has the same form as that of m.

rowvar :[bool, optional] If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed:

bias :Default normalization is False. If bias is True it normalize the data points.

ddof :If not None the default value implied by bias is overridden. Note that ddof=1 will return the unbiased estimate, even if both fweights and aweights are specified.

fweights :fweight is 1-D array of integer frequency weights

aweights :aweight is 1-D array of observation vector weights.

Returns:It returns ndarray covariance matrix

**Correlation:**

It shows whether and how strongly pairs of variables are related to each other. Correlation takes values between -1 to +1, wherein values close to +1 represents strong positive correlation and values close to -1 represents strong negative correlation. It gives the direction and strength of relationship between variables.

**Correlation Matrix:**

It** **is basically a covariance matrix. Also known as the auto-covariance matrix, dispersion matrix, variance matrix, or variance-covariance matrix. It is a matrix in which i-j position defines the correlation between the *i ^{th}* and

*j*parameter of the given data-set. It is calculated using

^{th}*numpy*‘s

*corrcoeff()*method.

**Syntax:**

numpy.corrcoef(x, y=None, rowvar=True, bias=<no value>, ddof=<no value>)

x :A 1-D or 2-D array containing multiple variables and observations. Each row of x represents a variable, and each column a single observation of all those variables. Also see rowvar below.

y, optional: An additional set of variables and observations. y has the same shape as x.

rowvar :If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

Returns:ndarray

**So Why do we need Correlation ? **

- Covariance tells us if two random variables are +ve or -ve related it doesn’t tell us by how much.
- Covariance is variant to arithmetic changes eg: if we multiply x by 10 or divide by 10 then the result will change, this is not true for correlation where the results remain unchanged by such operations.
- The output of covariance is difficult to compare as the values can range from – infinity to +infinity. While output values of correlation ranges from 0 to 1.

**Relation Between Correlation and Covariance **

Correlation is just normalized Covariance refer to the formula below.

where

are the standard deviation of x and y respectively.

**Python Program to convert Covariance matrix to Correlation matrix **

To solve this problem we have selected the iris data because to compute covariance we need data and it’s better if we use a real word example dataset.

Loading and displaying the dataset

## Python3

`import` `numpy as np` `import` `pandas as pd` `# loading in the iris dataset for demo purposes` `dataset ` `=` `pd.read_csv(` `"iris.csv"` `)` `dataset.head()` |

In this example we won’t be using the target column

## Python3

`data ` `=` `dataset.iloc[:, :` `-` `1` `].values` |

**Program to implement covariance matrix: **

## Python3

`# calculates the covariance between x and y` `def` `calcCov(x, y):` ` ` `mean_x, mean_y ` `=` `x.mean(), y.mean()` ` ` `n ` `=` `len` `(x)` ` ` `return` `sum` `((x ` `-` `mean_x) ` `*` `(y ` `-` `mean_y)) ` `/` `n` `# calculates the Covariance matrix` `def` `covMat(data):` ` ` `# get the rows and cols` ` ` `rows, cols ` `=` `data.shape` ` ` `# the covariance matroix has a shape of n_features x n_features` ` ` `# n_featurs = cols - 1 (not including the target column)` ` ` `cov_mat ` `=` `np.zeros((cols, cols))` ` ` `for` `i ` `in` `range` `(cols):` ` ` `for` `j ` `in` `range` `(cols):` ` ` `# store the value in the matrix` ` ` `cov_mat[i][j] ` `=` `calcCov(data[:, i], data[:, j])` ` ` `return` `cov_mat` ` ` `covMat(data)` |

**Output **:

*Numpy cov()* output :

## Python3

`np.cov(data,rowvar` `=` `False` `)` |

note : the rowVars needs to be make false otherwise it will take the rows as features and columns and observations.

**Output**:

**Calculating Correlation: **

In this function we are going to convert the Covariance matrix to correlation.

## Python3

`# Now calculating Correlation using our Covariance function (covMat())` `def` `corrMat(data):` ` ` `rows, cols ` `=` `data.shape` ` ` `corr_mat ` `=` `np.zeros((cols, cols))` ` ` `for` `i ` `in` `range` `(cols):` ` ` `for` `j ` `in` `range` `(cols):` ` ` `x, y ` `=` `data[:, i], data[:, j]` ` ` `# not here that we are just normalizing the covariance matrix` ` ` `corr_mat[i][j] ` `=` `calcCov(x, y) ` `/` `(x.std() ` `*` `y.std())` ` ` `return` `corr_mat` ` ` `corrMat(data)` ` ` ` ` |

**Output:**

The *corrcoef()*** **in *numpy *can also be used to compute correlation.** **

## Python3

`np.corrcoef(data,rowvar` `=` `False` `)` |

**Output:**

Attention geek! Strengthen your foundations with the **Python Programming Foundation** Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the **Python DS** Course. And to begin with your Machine Learning Journey, join the **Machine Learning – Basic Level Course**