Open In App
Related Articles

How to Perform Multivariate Normality Tests in Python

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

In this article, we will be looking at the various approaches to perform Multivariate Normality Tests in Python.

Multivariate Normality test is a test of normality, it determines whether the given group of variables comes from the normal distribution or not. Multivariate Normality Test determines whether or not a group of variables follows a multivariate normal distribution.

multivariate_normality() function

In this approach, the user needs to call the multivariate_normality() function with the required parameters from the pingouin library to conduct the multivariate Normality test on the given data in Python.

Syntax to install pingouin library:

pip install pingouin

Syntax: multivariate_normality(x,alpha)

Parameters:

  • X: Data matrix of shape (n_samples, n_features).
  • alpha: Significance level.

Returns

  • hz:he Henze-Zirkler test statistic.
  • pval:P-value.
  • normal: True if X comes from a multivariate normal distribution.

This is a hypotheses test and the two hypotheses are as follows:

  • H0 (accepted): The variables follow a multivariate normal distribution..(Po>0.05)
  • Ha (rejected): The variables do not follow a multivariate normal distribution.

Example 1: Multivariate Normality test on the  multivariate normal distribution in Python

In this example, we will be simply using the multivariate_normality() function from the pingouin library to Conduct a Multivariate Normality test on the randomly generated data with 100 data points with 5 variables in python.

Python3

from pingouin import multivariate_normality
import pandas as pd
import numpy as np
data = pd.DataFrame({'a': np.random.normal(size=100),
                         'b': np.random.normal(size=100),
                         'c': np.random.normal(size=100),
                         'd': np.random.normal(size=100),
                         'e': np.random.normal(size=100)})
  
# perform the Multivariate Normality Test
multivariate_normality(data, alpha=.05)

                    

Output:

HZResults(hz=0.7973450591569415, pval=0.8452549483161891, normal=True)

Output Interpretation:

Since in the above example, the p-value is 0.84 which is more than the threshold(0.5) which is the alpha(0.5) then we fail to reject the null hypothesis i.e. we do not have evidence to say that sample follows a multivariate normal distribution.

Example 2: Multivariate Normality test on not multivariate normal distribution in Python

In this example, we will be simply using the multivariate_normality() function from the pingouin library to Conduct a Multivariate Normality test on the randomly generated data passion distribution with 100 data points with 5 variables in python.

Python3

from pingouin import multivariate_normality
import pandas as pd
import numpy as np
data = pd.DataFrame({'a':np.random.poisson(size=100),
                   'b': np.random.poisson(size=100),
                   'c': np.random.poisson(size=100),
                   'd': np.random.poisson(size=100),
                   'e':np.random.poisson(size=100)})
  
# perform the Multivariate Normality Test
multivariate_normality(data, alpha=.05)

                    

HZResults(hz=7.4701896678920745, pval=0.00355552234721754, normal=False)

Output Interpretation:

Since in the above example, the p-value is 0.003 which is less than the alpha(0.5) then we reject the null hypothesis i.e. we have sufficient evidence to say that sample does not come from a multivariate normal distribution.



Last Updated : 20 Feb, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads