Z score is an important concept in statistics. Z score is also called standard score. This score helps to understand if a data value is greater or smaller than mean and how far away it is from the mean. More specifically, Z score tells how many standard deviations away a data point is from the mean.

Z score = (x -mean) / std. deviation

A normal distribution is shown below and it is estimated that

68% of the data points lie between +/- 1 standard deviation.

95% of the data points lie between +/- 2 standard deviation

99.7% of the data points lie between +/- 3 standard deviation

**Z score and Outliers:**

If the z score of a data point is more than 3, it indicates that the data point is quite different from the other data points. Such a data point can be an outlier.

For example, in a survey, it was asked how many children a person had.

Suppose the data obtained from people is

1, 2, 2, 2, 3, 1, 1, 15, 2, 2, 2, 3, 1, 1, 2

Clearly, 15 is an outlier in this dataset.

**Let us use calculate the Z score using Python to find this outlier.**

**Step 1: Import necessary libraries**

`import` `numpy as np ` |

*chevron_right*

*filter_none*

**Step 2: Calculate mean, standard deviation**

`data ` `=` `[` `1` `, ` `2` `, ` `2` `, ` `2` `, ` `3` `, ` `1` `, ` `1` `, ` `15` `, ` `2` `, ` `2` `, ` `2` `, ` `3` `, ` `1` `, ` `1` `, ` `2` `] ` `mean ` `=` `np.mean(data) ` `std ` `=` `np.std(data) ` `print` `(` `'mean of the dataset is'` `, mean) ` `print` `(` `'std. deviation is'` `, std) ` |

*chevron_right*

*filter_none*

**Output:**

mean of the dataset is 2.6666666666666665 std. deviation is 3.3598941782277745

**Step 3: Calculate Z score. If Z score>3, print it as an outlier.**

`threshold ` `=` `3` `outlier ` `=` `[] ` `for` `i ` `in` `data: ` ` ` `z ` `=` `(i` `-` `mean)` `/` `std ` ` ` `if` `z > threshold: ` ` ` `outlier.append(i) ` `print` `(` `'outlier in dataset is'` `, outlier) ` |

*chevron_right*

*filter_none*

**Output:**

outlier in dataset is [15]

**Conclusion**: Z score helps us identify outliers in the data.

## Recommended Posts:

- Machine Learning | Outlier
- Local outlier factor
- Real-Time Edge Detection using OpenCV in Python | Canny edge detection method
- Python | Corner detection with Harris Corner Detection method using OpenCV
- Python | Corner Detection with Shi-Tomasi Corner Detection Method using OpenCV
- Object Detection with Detection Transformer (DERT) by Facebook
- Python | Extract Score list of String
- Python - Coefficient of Determination-R2 score
- Python PRAW - Getting the score of a comment in Reddit
- NLP | How to score words with Execnet and Redis
- ML | Fowlkes-Mallows Score
- ML | Models Score and Error
- Opencv Python program for Face Detection
- Detection of a specific color(blue here) using OpenCV with Python
- Python Program to detect the edges of an image using OpenCV | Sobel edge detection method
- Line detection in python with OpenCV | Houghline method
- Python - Tuple key detection from value list
- Image Processing in Python (Scaling, Rotating, Shifting and Edge Detection)
- Face Detection using Python and OpenCV with webcam
- Python | Real time weather detection using Tkinter

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.