Cluster Analysis –
The aim of the clustering process is to discover overall distribution patterns and interesting correlations among the data attributes. It is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Cluster analysis itself is not one specific algorithm, but the general task to be solved. It can be achieved by various algorithms that differ significantly in their understanding of what constitutes a cluster and how to efficiently find them. Popular notions of clusters include groups with small distances between cluster members, dense areas of the data space, intervals or particular statistical distributions.
Here, we will discuss about the distance between the objects of the different clusters and the objects of the same clusters. We have two type of distance – Intercluster Distance and Intracluster Distance.
Let S and T are clusters formed using partition U.
d(x, y) is the distance between two objects x and y belonging to S and T respectively.
d(x, y) is calculated using well-known distance calculating methods such as Euclidean, Manhattan and Chebychev. |S| and |T| are the number of objects in clusters S and T respectively.
Intercluster distance is the distance between two objects belonging to two different clusters. It is of 5 types –
- Single Linkage Distance : The single linkage distance is the closest distance between two objects belonging to two different clusters defined as –
- Complete Linkage Distance : The complete linkage distance is the distance between two most remote objects belonging to two different clusters defined as –
- Average Linkage Distance : The average linkage distance is the average distance between all the objects belonging to two different clusters defined as –
- Centroid Linkage Distance : The centroid linkage distance is the distance between the centers vs and vt of two clusters S and T respectively, defined as –
- Average Centroid Linkage Distance : The average centroid linkage distance is the distance between the center of a cluster and all the objects belonging to a different cluster, defined as –
Intracluster distance is the distance between two objects belonging to same cluster. It is of 3 types –
- Complete Diameter Distance : The complete diameter distance is the distance between two most remote objects belonging to the same cluster defined as –
- Average Diameter Distance : The average diameter distance is the average distance between all the objects belonging to the same cluster defined as –
- Centroid Diameter Distance : The centroid diameter distance is double average distance between all of the objects and the cluster center of s defined as –
If a clustering algorithm makes clusters so that the Intercluster distance between different clusters is more and Intracluster distance of same cluster is less, then we can tell that it is a good clustering algorithm.
Here clustering algorithm in fig 3 is better than fig 2 and fig 1 as in fig 3 Intercluster distance is more and Intracluster distance is less.
- How Should a Machine Learning Beginner Get Started on Kaggle?
- 8 Best Topics for Research and Thesis in Artificial Intelligence
- Top Machine Learning Trends in 2019
- ML | Word Encryption using Keras
- Python | Single Point Crossover in Genetic Algorithm
- Python | Classify Handwritten Digits with Tensorflow
- How Does Google Use Machine Learning?
- How Does NASA Use Machine Learning?
- Firebase Machine Learning kit
- Deploy Machine Learning Model using Flask
- Top Machine Learning Applications in 2019
- ML | Classification vs Clustering
- ML | Handle Missing Data with Simple Imputer
- TF - IDF for Bigrams & Trigrams
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.