Open In App

Clustering in Data Mining

Last Updated : 22 Jun, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Clustering: 

The process of making a group of abstract objects into classes of similar objects is known as clustering. 

Points to Remember: 

One group is treated as a cluster of data objects

  • In the process of cluster analysis, the first step is to partition the set of data into groups with the help of data similarity, and then groups are assigned to their respective labels.
  • The biggest advantage of clustering over-classification is it can adapt to the changes made and helps single out useful features that differentiate different groups.

Applications of cluster analysis :

  • It is widely used in many applications such as image processing, data analysis, and pattern recognition.
  • It helps marketers to find the distinct groups in their customer base and they can characterize their customer groups by using purchasing patterns.
  • It can be used in the field of biology, by deriving animal and plant taxonomies and identifying genes with the same capabilities.
  • It also helps in information discovery by classifying documents on the web.

Clustering Methods: 

It can be classified based on the following categories.

  1. Model-Based Method
  2. Hierarchical Method
  3. Constraint-Based Method
  4. Grid-Based Method
  5. Partitioning Method
  6. Density-Based Method

Requirements of clustering in data mining: 

The following are some points why clustering is important in data mining.

  • Scalability – we require highly scalable clustering algorithms to work with large databases.
  • Ability to deal with different kinds of attributes – Algorithms should be able to work with the type of data such as categorical, numerical, and binary data.
  • Discovery of clusters with attribute shape – The algorithm should be able to detect clusters in arbitrary shapes and it should not be bounded to distance measures.
  • Interpretability – The results should be comprehensive, usable, and interpretable.
  • High dimensionality – The algorithm should be able to handle high dimensional space instead of only handling low dimensional data.

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads