Open In App

Difference between K-Means and DBScan Clustering

Clustering is a technique in unsupervised machine learning which groups data points into clusters based on the similarity of information available for the data points in the dataset. The data points belonging to the same clusters are similar to each other in some ways while the data items belonging to different clusters are dissimilar.

K-means and DBScan (Density Based Spatial Clustering of Applications with Noise)  are two of the most popular clustering algorithms in unsupervised machine learning.



1. K-Means Clustering : K-means is a centroid-based or partition-based clustering algorithm.  This algorithm partitions all the points in the sample space into K groups of similarity. The similarity is usually measured using Euclidean Distance .

The algorithm is as follows :



Algorithm:

2. DBScan Clustering : DBScan is a density-based clustering algorithm. The key fact of this algorithm is that the neighbourhood of each point in a cluster which is within a given radius (R) must have a minimum number of points (M). This algorithm has proved extremely efficient in detecting outliers and handling noise.

The algorithm is as follows :

Algorithm:

There are some notable differences between K-means and DBScan.

S.No. K-means Clustering DBScan Clustering
1. Clusters formed are more or less spherical or convex in shape and must have same feature size. Clusters formed are arbitrary in shape and may not have same feature size.
2. K-means clustering is sensitive to the number of clusters specified. Number of clusters need not be specified.
3. K-means Clustering is more efficient for large datasets. DBSCan Clustering can not efficiently handle high dimensional datasets.
4. K-means Clustering does not work well with outliers and noisy datasets. DBScan clustering efficiently handles outliers and noisy datasets.
5.  In the domain of anomaly detection, this algorithm causes problems as anomalous points will be assigned to the same cluster as “normal” data points. DBScan algorithm, on the other hand, locates regions of high density that are separated from one another by regions of low density.
6. It requires one parameter : Number of clusters (K)

It requires two parameters : Radius(R) and Minimum Points(M)

R determines a chosen radius such that if it includes enough points within it, it is a dense area.

M determines the minimum number of data points required in a neighborhood to be defined as a cluster.

7. Varying densities of the data points doesn’t affect K-means clustering algorithm. DBScan clustering does not work very well for sparse datasets or for data points with varying density.
Article Tags :