Clustering in Data Mining

Last Updated : 22 Jun, 2022

The process of making a group of abstract objects into classes of similar objects is known as clustering.

Points to Remember:

One group is treated as a cluster of data objects

In the process of cluster analysis, the first step is to partition the set of data into groups with the help of data similarity, and then groups are assigned to their respective labels.
The biggest advantage of clustering over-classification is it can adapt to the changes made and helps single out useful features that differentiate different groups.

Applications of cluster analysis :

It is widely used in many applications such as image processing, data analysis, and pattern recognition.
It helps marketers to find the distinct groups in their customer base and they can characterize their customer groups by using purchasing patterns.
It can be used in the field of biology, by deriving animal and plant taxonomies and identifying genes with the same capabilities.
It also helps in information discovery by classifying documents on the web.

Clustering Methods:

It can be classified based on the following categories.

Requirements of clustering in data mining:

The following are some points why clustering is important in data mining.

Scalability – we require highly scalable clustering algorithms to work with large databases.
Ability to deal with different kinds of attributes – Algorithms should be able to work with the type of data such as categorical, numerical, and binary data.
Discovery of clusters with attribute shape – The algorithm should be able to detect clusters in arbitrary shapes and it should not be bounded to distance measures.
Interpretability – The results should be comprehensive, usable, and interpretable.
High dimensionality – The algorithm should be able to handle high dimensional space instead of only handling low dimensional data.

Suggest improvement

Clustering Indexing in Databases

Share your thoughts in the comments

What kind of Experience do you want to share?