# Z score for Outlier Detection – Python

Z score is an important concept in statistics. Z score is also called standard score. This score helps to understand if a data value is greater or smaller than mean and how far away it is from the mean. More specifically, Z score tells how many standard deviations away a data point is from the mean.

Z score = (x -mean) / std. deviation

A normal distribution is shown below and it is estimated that

68% of the data points lie between +/- 1 standard deviation.

95% of the data points lie between +/- 2 standard deviation

99.7% of the data points lie between +/- 3 standard deviation

**Z score and Outliers:**

If the z score of a data point is more than 3, it indicates that the data point is quite different from the other data points. Such a data point can be an outlier.

For example, in a survey, it was asked how many children a person had.

Suppose the data obtained from people is

1, 2, 2, 2, 3, 1, 1, 15, 2, 2, 2, 3, 1, 1, 2

Clearly, 15 is an outlier in this dataset.

**Let us use calculate the Z score using Python to find this outlier.****Step 1: Import necessary libraries**

`import` `numpy as np ` |

**Step 2: Calculate mean, standard deviation**

`data ` `=` `[` `1` `, ` `2` `, ` `2` `, ` `2` `, ` `3` `, ` `1` `, ` `1` `, ` `15` `, ` `2` `, ` `2` `, ` `2` `, ` `3` `, ` `1` `, ` `1` `, ` `2` `]` `mean ` `=` `np.mean(data)` `std ` `=` `np.std(data)` `print` `(` `'mean of the dataset is'` `, mean)` `print` `(` `'std. deviation is'` `, std)` |

**Output:**

mean of the dataset is 2.6666666666666665 std. deviation is 3.3598941782277745

**Step 3: Calculate Z score. If Z score>3, print it as an outlier.**

`threshold ` `=` `3` `outlier ` `=` `[]` `for` `i ` `in` `data:` ` ` `z ` `=` `(i` `-` `mean)` `/` `std` ` ` `if` `z > threshold:` ` ` `outlier.append(i)` `print` `(` `'outlier in dataset is'` `, outlier)` |

**Output:**

outlier in dataset is [15]

**Conclusion**: Z score helps us identify outliers in the data.

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the **Machine Learning Foundation Course** at a student-friendly price and become industry ready.