Difference Between Agglomerative clustering and Divisive clustering

Last Updated : 01 Sep, 2023

Hierarchical clustering is a popular unsupervised machine learning technique used to group similar data points into clusters based on their similarity or dissimilarity. It is called “hierarchical” because it creates a tree-like hierarchy of clusters, where each node represents a cluster that can be further divided into smaller sub-clusters.

There are two types of hierarchical clustering techniques:

Agglomerative and
Divisive clustering

Agglomerative Clustering

Agglomerative clustering is a type of hierarchical clustering algorithm that merges the most similar pairs of data points or clusters, building a hierarchy of clusters until all the data points belong to a single cluster. It starts with each data point as its own cluster and then iteratively merges the most similar pairs of clusters until all data points belong to a single cluster

Divisive Clustering

Divisive Clustering is the technique that starts with all data points in a single cluster and recursively splits the clusters into smaller sub-clusters based on their dissimilarity. It is also known as, “top-down” clustering. It starts with all data points in a single cluster, and then recursively splits the clusters into smaller sub-clusters based on their dissimilarity.

Unlike agglomerative clustering, which starts with each data point as its own cluster and iteratively merges the most similar pairs of clusters, divisive clustering is a “divide and conquer” approach that breaks a large cluster into smaller sub-clusters

Example 1:

Here is a short example of agglomerative clustering using randomly generated data in Python –

In this example, we first create a random dataset with 50 samples and two features using NumPy’s randn function. Then, we use the linkage function from SciPy’s cluster.hierarchy module to perform hierarchical clustering using complete linkage method. The resulting linkage matrix Z contains information about the cluster merging process.

Finally, we plot the dendrogram using the dendrogram function from the same module. The dendrogram shows how the clusters were merged and at what distance, starting from individual samples at the bottom and ending with a single cluster at the top.

Python

# Import the necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
 
# Create a random dataset with two features and 50 samples
np.random.seed(0)
x = np.random.randn(20, 2)
 
# Perform hierarchical clustering using caomplete linkage
z = linkage(x, method='complete')
 
# Plot the dendrogram
plt.figure(figsize=(10, 5))
plt.title('Agglomerative Clustering Dendrogram')
plt.xlabel('Sample index')
plt.ylabel('Distance')
dendrogram(z)
plt.show()

Output:

Agglomerative Clustering

Note that there are other methods of linkage that can be used, such as single, average, and ward. The method used will affect the resulting dendrogram and clustering.

Example 2:

Here is a short example of agglomerative clustering using randomly generated data in Python –

In this example, we first generate a sample dataset with 10 data points and 2 features using NumPy. We then perform agglomerative clustering using the linkage function from SciPy, which takes the data matrix as input along with the clustering method (ward in this case) and distance metric (euclidean in this case). The output of linkage is a linkage matrix that represents the hierarchical clustering structure.

We then plot the dendrogram using the dendrogram function from SciPy, which takes the linkage matrix as input along with the plotting axis (ax). We set the color threshold to 0 to display all clusters in the dendrogram.

Python

# Import the necessary libraries
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import matplotlib.pyplot as plt
 
# Generate sample data
np.random.seed(0)
X = np.random.randn(20, 2)
 
# Perform divisive clustering
Z = linkage(X, method='ward', metric='euclidean')
 
# Plot dendrogram
# Plot the dendrogram
plt.figure(figsize=(10, 5))
plt.title('Agglomerative Clustering Dendrogram')
plt.xlabel('Sample index')
plt.ylabel('Distance')
dendrogram(Z)
plt.show()

Output:

Agglomerative clustering

The resulting plot will show the hierarchical clustering structure with each data point as a leaf node in the dendrogram and the clusters at different levels of the hierarchy.