Difference between K means and Hierarchical Clustering

k-means is method of cluster analysis using a pre-specified no. of clusters. It requires advance knowledge of ‘K’.

Hierarchical clustering also known as hierarchical cluster analysis (HCA) is also a method of cluster analysis which seeks to build a hierarchy of clusters without having fixed number of cluster.

Main differences between K means and Hierarchical Clustering are:

k-means Clustering Hierarchical Clustering
k-means, using a pre-specified  number of clusters, the method  assigns records to each cluster to  find the mutually exclusive cluster  of spherical shape based on distance. Hierarchical methods can be  either divisive or agglomerative.
K Means clustering needed advance knowledge of K i.e. no. of clusters one want to divide your data. In hierarchical clustering  one can stop at any number of clusters, one find appropriate by interpreting  the dendrogram.
One can use median or mean as a cluster centre to represent each cluster. Agglomerative methods  begin with ‘n’ clusters and  sequentially combine similar clusters until only one cluster is obtained.
Methods used are normally less computationally intensive and are suited with very large datasets. Divisive methods work in the opposite direction, beginning with one cluster that includes all the records and Hierarchical methods are  especially useful when the target is to arrange the clusters  into a natural hierarchy.
In K Means clustering, since one  start with random choice of  clusters, the results produced by running the algorithm many times may differ. In Hierarchical Clustering, results are reproducible in Hierarchical clustering
K- means clustering a simply a division of the set of data  objects into non- overlapping subsets (clusters) such that each  data object is in exactly one subset). A hierarchical clustering is a set of nested clusters that are arranged as a tree.
K Means clustering is found to work well when the structure of the clusters is hyper spherical (like circle in 2D,  sphere in 3D). Hierarchical clustering don’t work  as well as, k means when the  shape of the clusters is hyper  spherical.
Advantages:

1. Convergence is guranteed.



2. Specialized to clusters of different sizes and shapes.

Advantages: 

1 .Ease of handling of any forms of similarity or distance.

2. Consequently, applicability to any attributes types.

Disadvantages:

1. K-Value is difficult to predict

2. Didn’t work well with global cluster.

Disadvantage:

1. Hierarchical clustering requires the computation and storage of an n×n  distance matrix. For very large datasets, this can be expensive and slow

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


10


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.