Skip to content
Related Articles
Open in App
Not now

Related Articles

ML | Fuzzy Clustering

Improve Article
Save Article
  • Difficulty Level : Easy
  • Last Updated : 16 Jan, 2023
Improve Article
Save Article

Prerequisite: Clustering in Machine Learning 

What is clustering? 

Clustering is an unsupervised machine learning technique that divides the given data into different clusters based on their distances (similarity) from each other. 

The unsupervised k-means clustering algorithm gives the values of any point lying in some particular cluster to be either as 0 or 1 i.e., either true or false. But the fuzzy logic gives the fuzzy values of any particular data point to be lying in either of the clusters. Here, in fuzzy c-means clustering, we find out the centroid of the data points and then calculate the distance of each data point from the given centroids until the clusters formed become constant. 
Suppose the given data points are {(1, 3), (2, 5), (6, 8), (7, 9)}   

The steps to perform the algorithm are: 

Step 1: Initialize the data points into the desired number of clusters randomly. 

Let us assume there are 2 clusters in which the data is to be divided, initializing the data point randomly. Each data point lies in both clusters with some membership value which can be assumed anything in the initial state. 

The table below represents the values of the data points along with their membership (gamma) in each cluster.

Cluster    (1, 3)    (2, 5)    (4, 8)    (7, 9)
1)          0.8        0.7       0.2       0.1
2)          0.2        0.3       0.8       0.9

Step 2: Find out the centroid. 
The formula for finding out the centroid (V) is:

V_{ij} = ( \sum \limits_1^n ( \gamma_{ik}^m * x_k) / \sum \limits_1^n \gamma_{ik}^m

Where, ยต is fuzzy membership value of the data point, m is the fuzziness parameter (generally taken as 2), and xk is the data point. 
Here,

V11  = (0.8^2 *1 + 0.7^2 * 2 + 0.2^2 * 4 + 0.1^2 * 7) / ( (0.8^2 + 0.7^2  + 0.2^2  + 0.1^2 ) = 1.568
V12  = (0.8^2 *3 + 0.7^2 * 5 + 0.2^2 * 8 + 0.1^2 * 9) / ( (0.8^2 + 0.7^2  + 0.2^2  + 0.1^2 ) = 4.051
V21  = (0.2^2 *1 + 0.3^2 * 2 + 0.8^2 * 4 + 0.9^2 * 7) / ( (0.2^2 + 0.3^2  + 0.8^2  + 0.9^2 ) = 5.35
V22  = (0.2^2 *3 + 0.3^2 * 5 + 0.8^2 * 8 + 0.9^2 * 9) / ( (0.2^2 + 0.3^2  + 0.8^2  + 0.9^2 ) = 8.215
Centroids are: (1.568, 4.051) and (5.35, 8.215)

Step 3: Find out the distance of each point from the centroid.

D11 = ((1 - 1.568)2 + (3 - 4.051)2)0.5 = 1.2
D12 = ((1 - 5.35)2 + (3 - 8.215)2)0.5 = 6.79

Similarly, the distance of all other points is computed from both the centroids. 

Step 4: Updating membership values.

\gamma = \sum \limits_1^n {(d_{ki}^2 /d_{kj}^2)}^{1/m-1} ]^{-1}

For point 1 new membership values are:

\gamma_{11}       = [{ [(1.2)2 / (1.2)2] + [(1.2)2 / (6.79)2]} ^ {(1 / (2 – 1))} ] -1 = 0.96

\gamma_{12}       = [{ [(6.79)2 / (6.79)2] + [(6.79)2 / (1.2)2]} ^ {(1 / (2 – 1))} ] -1 = 0.04

Alternatively,

\gamma_{12} = 1- \gamma_{11} = 0.04

Similarly, compute all other membership values, and update the matrix. 

Step 5: Repeat the steps(2-4) until the constant values are obtained for the membership values or the difference is less than the tolerance value (a small value up to which the difference in values of two consequent updations is accepted). 

Step 6: Defuzzify the obtained membership values.   

Implementation: The fuzzy scikit learn library has a pre-defined function for fuzzy c-means which can be used in Python. For using fuzzy c-means you need to install the skfuzzy library.

pip install sklearn
pip install skfuzzy

Example :

Python3




import numpy as np
import skfuzzy as fuzz
from skfuzzy import control as ctrl
 
# Generate some example data
np.random.seed(0)
data = np.random.rand(100, 2)
 
# Define the number of clusters
n_clusters = 3
 
# Apply fuzzy c-means clustering
cntr, u, u0, d, jm, p, fpc = fuzz.cluster.cmeans(
    data.T, n_clusters, 2, error=0.005, maxiter=1000, init=None
)
 
# Predict cluster membership for each data point
cluster_membership = np.argmax(u, axis=0)
 
# Print the cluster centers
print('Cluster Centers:', cntr)
 
# Print the cluster membership for each data point
print('Cluster Membership:', cluster_membership)

output:

Cluster Centers: [[0.42363557 0.68304616]
[0.52768166 0.38180987]
[0.39967863 0.31042639]]

Cluster Membership: [1 0 1 2 1 0 0 2 0 2 1 0 0 0 1 2 1 1 0 2 1 0 0 2 1 1 2 0 1 0 0 2 0 2 1 1 0
1 0 2 0 0 2 0 0 1 1 1 1 2 2 0 1 1 0 1 0 0 2 2 2 0 1 1 2 0 0 0 2 1 0 1 0 0
1 2 0 0 2 2 1 1 0 2 1 0 2 2 1 1 0 2 1 0 2 1 0 2 2 2 0 1 0 2]
 


My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!