An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution error. The analysis of outlier data is referred to as outlier analysis or outlier mining.
Why outlier analysis?
Most data mining methods discard outliers noise or exceptions, however, in some applications such as fraud detection, the rare events can be more interesting than the more regularly occurring one and hence, the outlier analysis becomes important in such case.
Clustering based outlier detection using distance to the closest cluster:
In the K-Means clustering technique, each cluster has a mean value. Objects belong to the cluster whose mean value is closest to it. In order to identify the Outlier, firstly we need to initialize the threshold value such that any distance of any data point greater than it from its nearest cluster identifies it as an outlier for our purpose. Then we need to find the distance of the test data to each cluster mean. Now, if the distance between the test data and the closest cluster to it is greater than the threshold value then we will classify the test data as an outlier.
- Calculate the mean of each cluster
- Initialize the Threshold value
- Calculate the distance of the test data from each cluster mean
- Find the nearest cluster to the test data
- If (Distance > Threshold) then, Outlier
- Z score for Outlier Detection - Python
- Local outlier factor
- Learning Model Building in Scikit-learn : A Python Machine Learning Library
- Artificial intelligence vs Machine Learning vs Deep Learning
- How to Start Learning Machine Learning?
- Difference Between Artificial Intelligence vs Machine Learning vs Deep Learning
- Need of Data Structures and Algorithms for Deep Learning and Machine Learning
- Azure Virtual Machine for Machine Learning
- ML | Types of Learning – Supervised Learning
- Introduction to Multi-Task Learning(MTL) for Deep Learning
- Learning to learn Artificial Intelligence | An overview of Meta-Learning
- ML | Reinforcement Learning Algorithm : Python Implementation using Q-learning
- Machine Learning - Applications
- Demystifying Machine Learning
- Getting started with Machine Learning
- Introduction To Machine Learning using Python
- An introduction to Machine Learning
- Confusion Matrix in Machine Learning
- Data Preprocessing for Machine learning in Python
- Cross Validation in Machine Learning
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.