Open In App

Normal Probability Plot

The probability plot is a way of visually comparing the data coming from different distributions. These data can be of empirical dataset or theoretical dataset. The probability plot can be of two types:

Normal Probability plot: The normal probability plot is a way of knowing whether the dataset is normally distributed or not. In this plot, data is plotted against the theoretical normal distribution plot in a way such that if a given dataset is normally distributed it should form an approximate straight line. The normal probability plot is a case of the probability plot (more specifically Q-Q plot). This plot is commonly used in the industry for finding the deviation from the normal process. 



The normal probability plot has the following axis.

The order response variable can be calculated as:



Where Ui is the uniform order median statistics and G is the percent point function of normal distribution. It is the inverse of the cumulative distribution function. i.e given probability distribution, we want the cumulative distribution function.

The uniform order statistics medians can be approximated by:

The underlying assumptions for a measurement process that the data should have following 

The normal probability plot is used to answer the following questions:

Implementation

In this implementation, we will be using statsmodels python library and seaborn library for visualizing different plots. These libraries are pre-installed in colab, but for local environment using pip install. 

# imports 
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as sc
import statsmodels.graphics.gofplots as sm
  
# define distributions
sample_size = 10000 
standard_norm = np.random.normal(size=sample_size)
heavy_tailed_norm = np.random.normal(loc=0, scale=2, size=sample_size)
skewed_norm = sc.skewnorm.rvs(a=5, size=sample_size)
skew_left_norm = sc.skewnorm.rvs(a=-5, size=sample_size)
  
  
# plots for standard distribution
fig, ax = plt.subplots(1, 2, figsize=(12, 7))
sns.histplot(standard_norm,kde=True, color ='blue',ax=ax[0])
sm.ProbPlot(standard_norm).qqplot(line='s', ax=ax[1])
  
# plot for right-tailed distribution
fig, ax = plt.subplots(1, 2, figsize=(12, 7))
sm.ProbPlot(skewed_norm).qqplot(line='s', ax=ax[1]);
sns.histplot(skewed_norm,kde=True, color ='blue',ax=ax[0])
  
# plot for left-tailed distribution
fig, ax = plt.subplots(1, 2, figsize=(12, 7))
sm.ProbPlot(skew_left_norm).qqplot(line='s',color='red', ax=ax[1]);
sns.histplot(skew_left_norm,kde=True, color ='red',ax=ax[0])
  
# plot for heavy tailed distribution
fig, ax = plt.subplots(1, 2, figsize=(12, 7))
sm.ProbPlot(heavy_tailed_norm).qqplot(line='s',color ='green', ax=ax[1]);
sns.histplot(heavy_tailed_norm,kde=True, color ='green',ax=ax[0])
sns.histplot(standard_norm,kde=True, color ='red',ax=ax[0])

                    

Standard Normal

Right Skewed

Left-Skewed

Heavy-tailed (see axis values)

References:


Article Tags :