Skip to content
Related Articles

Related Articles

Different phases of projected clustering in data analytics

View Discussion
Improve Article
Save Article
Like Article
  • Last Updated : 26 Apr, 2022

In this article , we are going to discuss about different phases of projected clustering in data analytics in detail.

Three Phases for Projected Clustering :

  1. Initialization Phase
  2. Iterative Phase
  3. Refinement Phase

These are explained as following below.

1. Initialization Phase :
This phase comprises of two steps to select the superset.

  • In the first step, it picks up a random sample data points whose size is proportional to the number of clusters that the user wish to produce which is given as,
    S= random sample size A.k,

    where A is a constant and k represents the number of clusters.

  • The second step which uses the greedy method is accomplished to acquire a final set of points B.k,where B is a small constant.

This set is designated as M where hill climbing technique is put in during the next phase.

  • Pick up a sample set of data point randomly.
  • Pick up a set of data point which is probably the medoids of the cluster.

2. Iterative Phase :
From the initialization phase, we got a set of data points which should hold the medoids. This phase, we will find the best medoids from M. Randomly picks up the set of points M current, and restore the “bad” medoids from other point in M if required by which cluster quality is upgraded. The freshly formed meaningful medoid set is designated as M best.

For the medoids, following will be done as follows.

  • Identify dimensions associated to the medoids.
  • Allocate data points to the medoids.
  • Gauge the clusters formed.
  • Identify the poor medoid , and try the result of restoring poor medoid.
  • The above procedure is replicate until we got a pleased result.

3. Refinement Phase -Handle Outliers :

  • The end step of this algorithm is refinement phase. This phase comprises of better quality of the clusters formed.
  • The clusters C1,C2,C3,….,Ck formed during the iterative phase are the feed in to this phase.
  • The native data set is passed over one or more times to enhance the quality of the clusters.
  • The dimension sets Di found during the iterative phase are dispose of and new dimension sets are calculated for each of the cluster set Ci.
  • Once when the new dimensions are calculated for the clusters, then the points are reassigned to the medoids comparative to these new sets of dimensions.
  • Outliers are determined in the last pass over the data.

Major Drawback :

  • The algorithm requires the average number of dimensions per cluster as framework in input. The performance of projected clustering is highly sensitized to the value of its input framework.
  • If the average number of dimensions is erroneously estimated ,the presentation of projected clustering significantly worsens.

My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!