Standard Deviation Plot

Last Updated : 12 Jan, 2022

A standard deviation plot is used to check if there is a deviation between different groups of data. These groups can be generated manually or can be decided based on some property of the dataset.

Standard deviation plots can be formed of :

Vertical Axis: Group Standard deviation
Horizontal Axis: Group Identifier/ Label of the groups.

A reference straight line is plotted among the overall standard deviation.

The standard deviation plot is used to answer the following questions:

Is there any shift in the variation?
What is the magnitude of the shift in the variation?
Is there any distinct pattern in the shift of the variation?

A standard deviation plot is generally used to measure the scale, the same scale measure can also be used to find with mean absolute plot and average deviation plot. These plots also provide better accuracy in terms of identifying outliers.

Uses of Standard Deviation plot

A standard deviation plot is generally used to measure the scale, the same scale measure can also be found with mean absolute plot and average deviation plot. These plots also provide better accuracy in terms of identifying outliers.
A common assumption in many analyses such as 1-factor analysis that the variance is the same for different levels of factor variables. A standard deviation plot can be used to verify that.
We can also verify the constant variance assumptions of univariate data by dividing the data into equal size partitions and plotting variance for each of the partitions.

Implementation

In this implementation, we use the Delhi weather dataset from Kaggle. The link to the dataset can be found here

Python3

# import necessary modules
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
 
sns.set_style('darkgrid')
%matplotlib inline
sns.mpl.rcParams['figure.figsize'] = (10.0, 8.0)
 
# read weather dataset
df =pd.read_csv('weather.csv')
# remove the hours and minutes from time to keep date only
df['datetime_utc'] = pd.to_datetime(df['datetime_utc']).dt.date
df.head()
 
# group by dataframe into months, calculate standard deviation,
# and sort them in chronological order 
month_Df =df.groupby(df['datetime_utc'].dt.strftime('%B'))[" _tempm"].std()
new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 
             'August', 'September', 'October', 'November', 'December']
month_Df=month_Df.reindex(new_order)
month_Df
 
# plot scatterplot of the standard deviation (standard deviation plot)
graph =sns.scatterplot(y= month_Df.values, x= month_Df.index)
graph.axhline(df[" _tempm"].std(), color='red')
plt.show()

datetime_utc    _conds    _dewptm    _fog    _hail    _heatindexm    _hum    _precipm    _pressurem    _rain    _snow    _tempm    _thunder    _tornado    _vism    _wdird    _wdire    _wgustm    _windchillm    _wspdm
0    1996-11-01    Smoke    9.0    0    0    NaN    27.0    NaN    1010.0    0    0    30.0    0    0    5.0    280.0    West    NaN    NaN    7.4
1    1996-11-01    Smoke    10.0    0    0    NaN    32.0    NaN    -9999.0    0    0    28.0    0    0    NaN    0.0    North    NaN    NaN    NaN
2    1996-11-01    Smoke    11.0    0    0    NaN    44.0    NaN    -9999.0    0    0    24.0    0    0    NaN    0.0    North    NaN    NaN    NaN
3    1996-11-01    Smoke    10.0    0    0    NaN    41.0    NaN    1010.0    0    0    24.0    0    0    2.0    0.0    North    NaN    NaN    NaN
4    1996-11-01    Smoke    11.0    0    0    NaN    47.0    NaN    1011.0    0    0    23.0    0    0    1.2    0.0    North    NaN    NaN    0.0

datetime_utc
April        5.817769
August       2.928722
December     5.288852
February     5.404892
January      4.646874
July         3.394908
June         4.520245
March        5.905230
May          5.441476
November     5.556417
October      4.930381
September    3.437260
Name:  _tempm, dtype: float64