Open In App

Difference Between Agglomerative clustering and Divisive clustering

Last Updated : 01 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

 Hierarchical clustering is a popular unsupervised machine learning technique used to group similar data points into clusters based on their similarity or dissimilarity. It is called “hierarchical” because it creates a tree-like hierarchy of clusters, where each node represents a cluster that can be further divided into smaller sub-clusters.

There are two types of hierarchical clustering techniques: 

  1. Agglomerative and
  2. Divisive clustering

Agglomerative Clustering

Agglomerative clustering is a type of hierarchical clustering algorithm that merges the most similar pairs of data points or clusters, building a hierarchy of clusters until all the data points belong to a single cluster. It starts with each data point as its own cluster and then iteratively merges the most similar pairs of clusters until all data points belong to a single cluster  

Divisive Clustering

Divisive Clustering is the technique that starts with all data points in a single cluster and recursively splits the clusters into smaller sub-clusters based on their dissimilarity. It is also known as, “top-down” clustering. It starts with all data points in a single cluster, and then recursively splits the clusters into smaller sub-clusters based on their dissimilarity.

Unlike agglomerative clustering, which starts with each data point as its own cluster and iteratively merges the most similar pairs of clusters, divisive clustering is a “divide and conquer” approach that breaks a large cluster into smaller sub-clusters

Example 1: 

Here is a short example of agglomerative clustering using randomly generated data in Python –

In this example, we first create a random dataset with 50 samples and two features using NumPy’s randn function. Then, we use the linkage function from SciPy’s cluster.hierarchy module to perform hierarchical clustering using complete linkage method. The resulting linkage matrix Z contains information about the cluster merging process.

Finally, we plot the dendrogram using the dendrogram function from the same module. The dendrogram shows how the clusters were merged and at what distance, starting from individual samples at the bottom and ending with a single cluster at the top.

Python




# Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
 
# Create a random dataset with two features and 50 samples
np.random.seed(0)
x = np.random.randn(20, 2)
 
# Perform hierarchical clustering using caomplete linkage
z = linkage(x, method='complete')
 
# Plot the dendrogram
plt.figure(figsize=(10, 5))
plt.title('Agglomerative Clustering Dendrogram')
plt.xlabel('Sample index')
plt.ylabel('Distance')
dendrogram(z)
plt.show()


Output:

Agglomerative Clustering - Geeksforgeeks

Agglomerative Clustering

Note that there are other methods of linkage that can be used, such as single, average, and ward. The method used will affect the resulting dendrogram and clustering.

Example 2:

Here is a short example of agglomerative clustering using randomly generated data in Python –

In this example, we first generate a sample dataset with 10 data points and 2 features using NumPy. We then perform agglomerative clustering using the linkage function from SciPy, which takes the data matrix as input along with the clustering method (ward in this case) and distance metric (euclidean in this case). The output of linkage is a linkage matrix that represents the hierarchical clustering structure.

We then plot the dendrogram using the dendrogram function from SciPy, which takes the linkage matrix as input along with the plotting axis (ax). We set the color threshold to 0 to display all clusters in the dendrogram.

Python




# Import the necessary libraries
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import matplotlib.pyplot as plt
 
# Generate sample data
np.random.seed(0)
X = np.random.randn(20, 2)
 
# Perform divisive clustering
Z = linkage(X, method='ward', metric='euclidean')
 
# Plot dendrogram
# Plot the dendrogram
plt.figure(figsize=(10, 5))
plt.title('Agglomerative Clustering Dendrogram')
plt.xlabel('Sample index')
plt.ylabel('Distance')
dendrogram(Z)
plt.show()


Output:

Agglomerative clustering - Geeksforgeeks

Agglomerative clustering

The resulting plot will show the hierarchical clustering structure with each data point as a leaf node in the dendrogram and the clusters at different levels of the hierarchy.

Difference between agglomerative clustering and Divisive clustering :

S.No. Parameters Agglomerative Clustering  Divisive Clustering
 1.   Category Bottom-up approach Top-down approach
  2.   Approach                    each data point starts in its own cluster, and the algorithm recursively merges the closest pairs of clusters until a single cluster containing all the data points is obtained.  all data points start in a single cluster, and the algorithm recursively splits the cluster into smaller sub-clusters until each data point is in its own cluster.
  3. Complexity    level               Agglomerative clustering is generally more computationally expensive, especially for large datasets as this approach requires the calculation of all pairwise distances between data points, which can be computationally expensive. Comparatively less expensive as divisive clustering only requires the calculation of distances between sub-clusters, which can reduce the computational burden.
  4.  Outliers Agglomerative clustering can handle outliers better than divisive clustering since outliers can be absorbed into larger clusters divisive clustering may create sub-clusters around outliers, leading to suboptimal clustering results.                                                                                                                                                                                                                                        
  5. Interpretability                                                                      Agglomerative clustering tends to produce more interpretable results since the dendrogram shows the merging process of the clusters, and the user can choose the number of clusters based on the desired level of granularity. divisive clustering can be more difficult to interpret since the dendrogram shows the splitting process of the clusters, and the user must choose a stopping criterion to determine the number of clusters.
  6. Implementation                                                                             Scikit-learn provides multiple linkage methods for agglomerative clustering, such as “ward,” “complete,” “average,” and “single,”                                                                                               divisive clustering is not currently implemented in Scikit-learn.                                                                                                                        
  7. Example

Here are some of the applications in which Agglomerative Clustering is used :

Image segmentation, Customer segmentation, Social network analysis, Document clustering, Genetics, genomics, etc., and many more.

Here are some of the applications in which Divisive Clustering is used :

Market segmentation, Anomaly detection, Biological classification, Natural language processing, etc. 



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads