CURE(Clustering Using REpresentatives)
- It is a hierarchical based clustering technique, that adopts a middle ground between the centroid based and the all-point extremes. Hierarchical clustering is a type of clustering, that starts with a single point cluster, and moves to merge with another cluster, until the desired number of clusters are formed.
- It is used for identifying the spherical and non-spherical clusters.
- It is useful for discovering groups and identifying interesting distributions in the underlying data.
- Instead of using one point centroid, as in most of data mining algorithms, CURE uses a set of well-defined representative points, for efficiently handling the clusters and eliminating the outliers.
Six steps in CURE algorithm:
- Idea: Random sample, say ‘s’ is drawn out of a given data. This random sample is partitioned, say ‘p’ partitions with size s/p. The partitioned sample is partially clustered, into say ‘s/pq’ clusters. Outliers are discarded/eliminated from this partially clustered partition. The partially clustered partitions need to be clustered again. Label the data in the disk.
- Procedure :
- Select target sample number ‘gfg’.
- Choose ‘gfg’ well scattered points in a cluster.
- These scattered points are shrunk towards centroid.
- These points are used as representatives of clusters and used in ‘Dmin’ cluster merging approach. In Dmin(distance minimum) cluster merging approach, the minimum distance from the scattered point inside the sample ‘gfg’ and the points outside ‘gfg sample, is calculated. The point having the least distance to the scattered point inside the sample, when compared to other points, is considered and merged into the sample.
- After every such merging, new sample points will be selected to represent the new cluster.
- Cluster merging will stop until target, say ‘k’ is reached.
- Difference between CURE Clustering and DBSCAN Clustering
- Basic understanding of Jarvis-Patrick Clustering Algorithm
- Basic Understanding of Bayesian Belief Networks
- Understanding Logistic Regression
- ML | Understanding Data Processing
- Understanding Tensor Processing Units
- Understanding Types of Means | Set 1
- Understanding Types of Mean | Set 2
- Understanding different Box Plot with visualization
- Understanding Activation Functions in Depth
- Understanding Hypothesis Testing
- OpenCV | Understanding Brightness in an Image
- ML | Understanding Hypothesis
- Understanding of LSTM Networks
- Understanding BERT - NLP
- Understanding GoogLeNet Model - CNN Architecture
- Analysis required in Natural Language Generation (NLG) and Understanding (NLU)
- Understanding PEAS in Artificial Intelligence
- Understanding Auxiliary Classifier : GAN
- Understanding of OpenSeq2Seq
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.