# Calculate pooled standard deviation in Python

• Last Updated : 26 Nov, 2020

We are quite aware that the Standard deviations are for measuring the spread of the numbers in the datasets. The smaller standard deviations suggest that the deviations in the elements are very small or quite insignificant from the mean values of the data sets & the larger deviations suggest a significant or large spread of the items from their mean values in the data sets.

We can compute standard deviations using Python, we will see that here. In Python 3.x we get enormous libraries for the statistical computations. Python’s statistics is a built-in Python library for descriptive statistics. We can use it if our datasets are not too large or if we cannot simply depend on importing other libraries.

### Pooled Standard Deviation:

The Pooled Standard Deviation is a weighted average of standard deviations for two or more groups. The individual standard deviations are averaged, with more “weight” given to larger sample sizes.

This is Cohen’s alternative formula here for reference:

``` SDpooled = √((n1-1).SD12 + (n2-1).SD22)/(n1+n2-2)
```

where,

• SD1 = Standard Deviation for group 1
• SD2 = Standard Deviation for group 2
• n1 =  Sample Size for group 1
• n2 =  Sample Size for group 2

For equal-sized samples, it simply becomes,

```SDpooled = √(SD12 + SD22)/2
```

Step for calculation:

1. Import statistics (for python standard deviation libraries)
2. Import math  (to calculate the sqrt)
3. Determine the length of the samples using the len function in python (say n1 = len(sample1))
4. Calculate the standard deviation of the samples (exg. sample1, using statistics.stdev(sample1))
5. Finally calculate the Pooled standard Deviation of the samples using the formula.
`Pooled standard deviation = √ (n1-1)sample12 +  (n2-1)sample22 / (n1+n2-2)`

Note: If the samples are empty, StatisticsError will be raised.

Step 1: let us try this with an example :

• Firstly we import the required modules.
• Then, let’s say, we have two samples, sample1 = [4, 5, 6] and sample2 = [10, 12, 14, 16, 18, 20]. Now, statistics.stdev(sample1) calculates the standard deviation for it(basically the statistics.stdev() function computes sample standard deviation on a list of values in Python).

## Python3

 `# import module``import` `math``import` `statistics``sample1 ``=` `[``4``, ``5``, ``6``]`` ` `# Computing sample standard deviation for sample1``SD1 ``=` `statistics.stdev(sample1)      ``print``(``"Standard Deviation for 1st sample = "``, SD1)``sample2 ``=` `[``10``, ``12``, ``14``, ``16``, ``18``, ``20``]`` ` `# Computing sample standard deviation for sample2``SD2 ``=` `statistics.stdev(sample2)  ``print``(``"Standard Deviation for 2nd sample = "``, SD2)`

Output:

```Standard Deviation for 1st sample =  1.0
Standard Deviation for 2nd sample =  3.7416573867739413```

Step 2: Then, let’s calculate the length of the samples using the len function in Python

## Python3

 `import` `math``import` `statistics``sample1 ``=` `[``4``, ``5``, ``6``]`` ` `# Computing sample standard deviation for sample1``SD1 ``=` `statistics.stdev(sample1)``sample2 ``=` `[``10``, ``12``, ``14``, ``16``, ``18``, ``20``]`` ` `# Computing sample standard deviation for sample2``SD2 ``=` `statistics.stdev(sample2)   `` ` `# calculate length of 1st sample``n1 ``=` `len``(sample1)`` ` `# calculate length of 2nd sample``n2 ``=` `len``(sample2)`` ` `print``(``"sample1 : length = "``, n1, ``" | S.D. = "``, SD1)``print``(``"sample2 : length = "``, n2, ``" | S.D. = "``, SD2)`

Output:

```sample1 : length =  3  | S.D. =  1.0
sample2 : length =  6  | S.D. =  3.7416573867739413```

Step 3: Finally, we calculate the Pooled Standard Deviation by using the formula stated above.

## Python3

 `import` `math``import` `statistics``sample1 ``=` `[``4``, ``5``, ``6``]`` ` `# Computing sample standard deviation for sample1``SD1 ``=` `statistics.stdev(sample1)``sample2 ``=` `[``10``, ``12``, ``14``, ``16``, ``18``, ``20``]`` ` `# Computing sample standard deviation for sample2``SD2 ``=` `statistics.stdev(sample2)   `` ` `# calculate length of 1st sample``n1 ``=` `len``(sample1)`` ` `# calculate length of 2nd sample``n2 ``=` `len``(sample2)`` ` ` ` `pooled_standard_deviation ``=` `math.sqrt(``                      ``((n1 ``-` `1``)``*``SD1 ``*` `SD1 ``+``                     ``(n2``-``1``)``*``SD2 ``*` `SD2) ``/` `                                  ``(n1 ``+` `n2``-``2``))``print``(``"Pooled Standard Deviation = "``,``      ``pooled_standard_deviation)`

Output:

```Pooled Standard Deviation =  3.2071349029490928
```

My Personal Notes arrow_drop_up