What is the Local outlier factor? Local outlier factor (LOF) is an algorithm used for Unsupervised outlier detection. It produces an anomaly score that represents data points which are outliers in the data set. It does this by measuring the local density deviation of a given data point with respect to the data points near it. Working of LOF: Local density is determined by estimating distances between data points that are neighbors (k-nearest neighbors). So for each data point, local density can be calculated. By comparing these we can check which data points have similar densities and which have a lesser density than its neighbors. The ones with the lesser densities are considered as the outliers. Firstly, k-distances are distances between points that are calculated for each point to determine their k-nearest neighbors. The 2nd closest point is said to be the 2nd nearest neighbor to the point. Here is an image which represents k-distances of various neighbors in the cluster of a point:
LOF ~ 1 => Similar data point LOF < 1 => Inlier ( similar data point which is inside the density cluster) LOF > 1 => Outlier
Here is an image of the plot of LOF on a data set:
- Sometimes it might get tricky to determine outliers. A point that is at a small distance from a very dense cluster might be considered as an outlier but a point that is at a farther distance from a wider spread cluster might be considered an inlier. With LOR, outliers in local areas are determined, so this issue does not persist.
- The method used in LOF can be applied in many other fields to solve problems of detecting outliers like geographic data, video streams, etc.
- The LOF can be used to implement a different dissimilarity function as well. And it is found to outperform many other algorithms of anomaly detection.
Disadvantages:
- It is not always the same LOF score that determines whether a point is an outlier or not. It might vary for different data sets.
- In higher dimensions, the LOF algorithm detection accuracy gets effected.
- As LOF score can be any number that the ratio produces, it might be a little inconvenient to understand the distinguishing of inliers and outliers based on it.