Hierarchical clustering is an Unsupervised non-linear algorithm in which clusters are created such that they have a hierarchy(or a pre-determined ordering). For example, consider a family of up to three generations. A grandfather and mother have their children that become father and mother of their children. So, they all are grouped together to the same family i.e they form a hierarchy.

Hierarchical clustering is of two types:

**Divisive Hierarchical clustering:**It starts at individual leaves and successfully merge clusters together. Its a Bottom up approach.**Agglomerative Hierarchical clustering:**It starts at root and recursively split the clusters. It’s a Top down approach.

#### Theory

In hierarchical clustering, Objects are categorized into a hierarchy similar to tree shaped structure which is used to interpret hierarchical clustering models. The algorithm is as follows:

- Make each data point in single point cluster that forms
**N**clusters. - Take the two closest data points and make them one cluster that forms
**N-1**clusters. - Take the two closest clusters and make them one cluster that forms
**N-2**clusters. - Repeat steps 3 until there is only one cluster.

Dendrogram is a hierarchy of clusters in which distances are converted into heights. It clusters**n** units or objects each with **p** feature into smaller groups. Units in the same cluster are joined by a horizontal line. The leaves at the bottom represent individual units. It provides a visual representation of clusters.

**Thumb Rule:** Largest vertical distance which doesn’t cut any horizontal line defines the optimal number of clusters.

#### The Dataset

** mtcars**(motor trend car road test) comprises fuel consumption, performance and 10 aspects of automobile design for 32 automobiles. It comes pre installed with dplyr package in R.

`# Installing the package ` `install.packages(` `"dplyr"` `) ` ` ` `# Loading package ` `library(dplyr) ` ` ` `# Summary of dataset in package ` `head(mtcars) ` |

*chevron_right*

*filter_none*

#### Performing Hierarchical clustering on Dataset

Using Hierarchical Clustering algorithm on the dataset using ** hclust()** which is pre installed in stats package when R is intalled.

`# Finding distance matrix ` `distance_mat <` `-` `dist(mtcars, method ` `=` `'euclidean'` `) ` `distance_mat ` ` ` `# Fitting Hierarchical clustering Model ` `# to training dataset ` `set` `.seed(` `240` `) ` `# Setting seed ` `Hierar_cl <` `-` `hclust(distance_mat, method ` `=` `"average"` `) ` `Hierar_cl ` ` ` `# Plotting dendrogram ` `plot(Hierar_cl) ` ` ` `# Choosing no. of clusters ` `# Cutting tree by height ` `abline(h ` `=` `110` `, col ` `=` `"green"` `) ` ` ` `# Cutting tree by no. of clusters ` `fit <` `-` `cutree(Hierar_cl, k ` `=` `3` `) ` `fit ` ` ` `table(fit) ` `rect.hclust(Hierar_cl, k ` `=` `3` `, border ` `=` `"green"` `) ` |

*chevron_right*

*filter_none*

#### Output:

**Distance matrix:**The values are shown as per the distance matrix calculation with the method as euclidean.

**Model Hierar_cl:**In the model, the cluster method is average, distance is euclidean and no. of objects are 32.

**Plot dendrogram:**

The plot dendrogram is shown with x-axis as distance matrix and y-axis as height.

**Cutted tree:**So, Tree is cut where k = 3 and each category represents its number of clusters.

**Plotting dendrogram after cutting:**The plot denotes dendrogram after being cut. The green lines show the number of clusters as per thumb rule.

So, Hierarchical clustering is widely used in the industry.

## Recommended Posts:

- Difference between Hierarchical and Non Hierarchical Clustering
- ML | Hierarchical clustering (Agglomerative and Divisive clustering)
- Difference between K means and Hierarchical Clustering
- DBSCAN Clustering in ML | Density based clustering
- Difference between CURE Clustering and DBSCAN Clustering
- Clustering in R Programming
- K-Means Clustering in R Programming
- DBScan Clustering in R Programming
- K means Clustering - Introduction
- Analysis of test data using K-Means Clustering in Python
- Clustering in Machine Learning
- Different Types of Clustering Algorithm
- ML | Unsupervised Face Clustering Pipeline
- ML | Determine the optimal value of K in K-Means Clustering
- ML | Mini Batch K-means clustering algorithm
- Image compression using K-means clustering
- ML | Mean-Shift Clustering
- ML | K-Medoids clustering with solved example
- Implementing Agglomerative Clustering using Sklearn
- ML | OPTICS Clustering Implementing using Sklearn

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.