Z score is an important concept in statistics. Z score is also called standard score. This score helps to understand if a data value is greater or smaller than mean and how far away it is from the mean. More specifically, Z score tells how many standard deviations away a data point is from the mean.

Z score = (x -mean) / std. deviation

A normal distribution is shown below and it is estimated that

68% of the data points lie between +/- 1 standard deviation.

95% of the data points lie between +/- 2 standard deviation

99.7% of the data points lie between +/- 3 standard deviation

**Z score and Outliers:**

If the z score of a data point is more than 3, it indicates that the data point is quite different from the other data points. Such a data point can be an outlier.

For example, in a survey, it was asked how many children a person had.

Suppose the data obtained from people is

1, 2, 2, 2, 3, 1, 1, 15, 2, 2, 2, 3, 1, 1, 2

Clearly, 15 is an outlier in this dataset.

**Let us use calculate the Z score using Python to find this outlier.**

**Step 1: Import necessary libraries**

`import` `numpy as np ` |

*chevron_right*

*filter_none*

**Step 2: Calculate mean, standard deviation**

`data ` `=` `[` `1` `, ` `2` `, ` `2` `, ` `2` `, ` `3` `, ` `1` `, ` `1` `, ` `15` `, ` `2` `, ` `2` `, ` `2` `, ` `3` `, ` `1` `, ` `1` `, ` `2` `] ` `mean ` `=` `np.mean(data) ` `std ` `=` `np.std(data) ` `print` `(` `'mean of the dataset is'` `, mean) ` `print` `(` `'std. deviation is'` `, std) ` |

*chevron_right*

*filter_none*

**Output:**

mean of the dataset is 2.6666666666666665 std. deviation is 3.3598941782277745

**Step 3: Calculate Z score. If Z score>3, print it as an outlier.**

`threshold ` `=` `3` `outlier ` `=` `[] ` `for` `i ` `in` `data: ` ` ` `z ` `=` `(i` `-` `mean)` `/` `std ` ` ` `if` `z > threshold: ` ` ` `outlier.append(i) ` `print` `(` `'outlier in dataset is'` `, outlier) ` |

*chevron_right*

*filter_none*

**Output:**

outlier in dataset is [15]

**Conclusion**: Z score helps us identify outliers in the data.

## Recommended Posts:

- Machine Learning | Outlier
- Local outlier factor
- Real-Time Edge Detection using OpenCV in Python | Canny edge detection method
- Python | Corner detection with Harris Corner Detection method using OpenCV
- Python | Corner Detection with Shi-Tomasi Corner Detection Method using OpenCV
- Object Detection with Detection Transformer (DERT) by Facebook
- Python | Extract Score list of String
- Python - Coefficient of Determination-R2 score
- Python PRAW - Getting the score of a comment in Reddit
- Calculating the completeness score using sklearn in Python
- NLP | How to score words with Execnet and Redis
- ML | Fowlkes-Mallows Score
- ML | Models Score and Error
- Opencv Python program for Face Detection
- Detection of a specific color(blue here) using OpenCV with Python
- Python Program to detect the edges of an image using OpenCV | Sobel edge detection method
- Line detection in python with OpenCV | Houghline method
- Python - Tuple key detection from value list
- Image Processing in Python (Scaling, Rotating, Shifting and Edge Detection)
- Face Detection using Python and OpenCV with webcam

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.