Open In App

Basic understanding of Jarvis-Patrick Clustering Algorithm

Jarvis Patrick Clustering Algorithm is a graph-based clustering technique, that replaces the vicinity between two points with the SNN similarity, which is calculated as described in the SNN Algorithm. A threshold is then accustomed to sparsify this matrix of SNN similarities.

Note: ‘Sparsification’ is a technique where the amount of data, that needs to be processed, is drastically reduced. 



SNN Algorithm (Shared Nearest Neighbor similarity) :

  1. Find the k-nearest neighbors of all points.
  2. if two points, x and y don’t seem to be among the k-nearest neighbors of each other then
  3. similarity (x,y)<–0
  4. else
  5. similarity (x,y)<– the number of shared neighbors
  6. end if

JARVIS-PATRICK CLUSTERING ALGORITHM: There are 3 inputs, necessary for the algorithm to work. 



The working of the algorithm is as follows:

  1. The ‘k’ nearest neighbors of all points are found.
  2. A pair of points are put within the same cluster, if any two points share more than ‘t’ neighbors and also the two points are in each others ‘k’ nearest neighbor list.

Example: Jarvis-Patrick clustering of a two-dimensional data set.


Jarvis-Patrick Clustering of a two-dimensional point set

Strengths :

  1. Since Jarvis-Patrick clustering is predicated on the notion of SNN similarity, it is good at coping with noise and might handle clusters of assorted sizes, shapes and densities.
  2. The algorithm works well for high-dimensional data.
  3. The algorithm is wonderful at finding tight clusters of strongly related objects.

Limitations:

  1. Jarvis-Patrick clustering defines a cluster as a connected component within the SNN similarity graph. Thus, whether a bunch of objects is split into two clusters or left collectively may rely on a single link. Hence, Jarvis-Patrick clustering is somewhat brittle i.e., it may split true clusters or join clusters that ought to be kept separate,
  2. In this algorithm, not all objects are clustered. However, these objects will be added to existing clusters, and in some cases, there is no requirement for a full clustering. When put next to other clustering algorithms, choosing the simplest and best values for the parameters may be challenging.
Article Tags :