Open In App

Clustering in Julia

Clustering in Julia is a very commonly used method in unsupervised learning. In this method, we put similar data points into a cluster based on the number of features they have in common. The number of clusters created during the clustering process is decided based on the complexity and size of the dataset. All the points in a cluster have some similarities which can increase or decrease depending upon how many features are being selected before starting the training.

Applications of Clustering in Julia

Types of Clustering

Clustering can be divided into two major parts:
Hard Clustering: It is a type of clustering where a data point can only belong to a single cluster. A data point will either belong to a cluster completely or not at all. This type of clustering does not use probability to put a data point to a cluster. The most common example of hard clustering is K-means clustering.
Soft Clustering: In this type of clustering a data point can exist in many or all clusters with some probability. This type of clustering method does not put a data point into a cluster completely. Each data point has some probability to exist in every cluster. This type of clustering is used in fuzzy programming or soft computing.



K-means Clustering in Julia

K-means clustering comes under unsupervised learning. It is an iterative method where the data points are put one predefined cluster depending upon the similarities of their features. The number of clusters is set by the user before training. This method also finds out the centroid of the clusters. 
 

Algorithm:



Syntax: kmeans(X,k)

where,
X: represents the features 
k: represents the number of clusters 

using RDatasets, Clustering, Plots
 
# loading the dataset
iris = dataset("datasets", "iris");
 
# features for clustering
features = collect(Matrix(iris[:, 1:4])');
 
# result after running K-means for the 3 clusters
result = kmeans(features, 3);
 
# plotting the result
scatter(iris.PetalLength, iris.PetalWidth,
        marker_z = result.assignments,
        color =:lightrainbow, legend = false)
 
# saving the result in PNG form
savefig("D:\\iris.png")

                    

Output:
 

Similarity Aggregation Clustering

This is another type of clustering where each data point is compared to every other data point in a pair. This method of clustering is also known as the Condorcet method or relational clustering. For a pair of values X and Y, values are assigned to two vectors m(X, Y) and d(X, Y). The values of X and Y are the same in m(X, Y) but different in d(X, Y).
 

   

where, S is the cluster 
The first condition is used to create a cluster. The second condition is used to calculate the global Condorcet criterion. This is an iterative process where the iterations occur until the specific iteration conditions are not met or the global Condorcet criterion shows no improvement.


Article Tags :