Different Phases of Projected Clustering in Data Analytics

We know Projected clustering is a typical dimension reduction subspace clustering method which instead of initiating from single dimensional spaces, proceeds by identifying an initial approximation of the clusters in high dimensional attribute space. But to do this projected clustering algorithm goes through different phases. In this article, we are going to discuss these different phases of projected clustering in data analytics in detail.

Prequisite – Projected clustering

Three Phases for Projected Clustering

Initialization Phase
Iterative Phase
Refinement Phase

These are explained as following below.

1. Initialization Phase:

This phase comprises two steps to select the superset.

In the first step, it picks up a random sample data points whose size is proportional to the number of clusters that the user wishes to produce which is given as,

                            S= random sample size A.k,

where A is a constant and k represents the number of clusters.
The second step which uses the greedy method is accomplished to acquire a final set of points B.k, where B is a small constant.

This set is designated as M where the hill climbing technique is put in during the next phase.

Pick up a sample set of data points randomly.
Pick up a set of data points which is probably the medoids of the cluster.

2. Iterative Phase:

From the initialization phase, we got a set of data points that should hold the medoids. In this phase, we will find the best medoids from M. Randomly picks up the set of points M current, and restore the “bad” medoids from other points in M if required by which cluster quality is upgraded. The freshly formed meaningful medoid set is designated as M best. For the medoids, the following will be done as follows.

Identify dimensions associated with the medoids.
Allocate data points to the medoids.
Gauge the clusters formed.
Identify the poor medoid, and try the result of restoring poor medoid.
The above procedure is replicated until we got a pleased result.

3. Refinement Phase -Handle Outliers :

The end step of this algorithm is the refinement phase. This phase comprises a better quality of the clusters formed.
The clusters C1, C2, C3,…., and Ck formed during the iterative phase are feed into this phase.
The native data set is passed over one or more times to enhance the quality of the clusters.
The dimension sets Di found during the iterative phase are disposed of and new dimension sets are calculated for each of the cluster set Ci.
Once when the new dimensions are calculated for the clusters, then the points are reassigned to the medoids compared to these new sets of dimensions.
Outliers are determined in the last pass over the data.

Major Drawback of Projected Clustering Algorithm :

The algorithm requires the average number of dimensions per cluster as a framework in input. The performance of projected clustering is highly sensitized to the value of its input framework.
If the average number of dimensions is erroneously estimated, the presentation of projected clustering significantly worsens.

Article Tags :

Computer Subject

DBMS

Misc

data-science