# Silhouette Index – Cluster Validity index | Set 2

**Prerequisite:** Dunn index and DB index – Cluster Validity indices

Many interesting algorithms are applied to analyze very large datasets. Most algorithms don’t provide any means for its validation and evaluation. So it is very difficult to conclude which are the best clusters and should be taken for analysis.

There are several indices for predicting optimal clusters –

- Silhouette Index
- Dunn Index
- DB Index
- CS Index
- I- Index
- XB or Xie Beni Index

Now, let’s discuss internal cluster validity index * Silhouette Index*.

### Silhouette Index –

Silhouette analysis refers to a method of interpretation and validation of consistency within clusters of data. The silhouette value is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation). It can be used to study the separation distance between the resulting clusters. The silhouette plot displays a measure of how close each point in one cluster is to points in the neighboring clusters and thus provides a way to assess parameters like number of clusters visually.

How Silhouette Analysis Works ?The Silhouette validation technique calculates the silhouette index for each sample, average silhouette index for each cluster and overall average silhouette index for a dataset. Using the approach each cluster could be represented by Silhouette index, which is based on the comparison of its tightness and separation.

**Calculation of Silhouette Value –**

If the Silhouette index value is high, the object is well-matched to its own cluster and poorly matched to neighbouring clusters. The Silhouette Coefficient is calculated using the mean intra-cluster distance (a) and the mean nearest-cluster distance (b) for each sample. The Silhouette Coefficient is defined as –

**S(i) = ( b(i) – a(i) ) / ( max { ( a(i), b(i) ) }**

Where,

- a(i) is the average dissimilarity of i
^{th}object to all other objects in the same cluster - b(i) is the average dissimilarity of i
^{th}object with all objects in the closest cluster.

**Range of Silhouette Value –**

Now, obviously S(i) will lie between **[-1, 1]** –

- If silhouette value is close to 1, sample is well-clustered and already assigned to a very appropriate cluster.
- If silhouette value is about to 0, sample could be assign to another cluster closest to it and the sample lies equally far away from both the clusters. That means it indicates overlapping clusters
- If silhouette value is close to –1, sample is misclassified and is merely placed somewhere in between the clusters.

Below is the Python implementation of above Silhouette Index:

`from` `sklearn.datasets ` `import` `make_blobs` `from` `sklearn.cluster ` `import` `KMeans` `from` `sklearn.metrics ` `import` `silhouette_score` ` ` `# Generating the sample data from make_blobs` ` ` `X, Y ` `=` `make_blobs() ` ` ` `no_of_clusters ` `=` `[` `2` `, ` `3` `, ` `4` `, ` `5` `, ` `6` `]` ` ` `for` `n_clusters ` `in` `no_of_clusters:` ` ` ` ` `cluster ` `=` `KMeans(n_clusters ` `=` `n_clusters)` ` ` `cluster_labels ` `=` `cluster.fit_predict(X)` ` ` ` ` `# The silhouette_score gives the ` ` ` `# average value for all the samples.` ` ` `silhouette_avg ` `=` `silhouette_score(X, cluster_labels)` ` ` ` ` `print` `(` `"For no of clusters ="` `, n_clusters,` ` ` `" The average silhouette_score is :"` `, silhouette_avg)` |

**Output:**

For no of clusters = 2 The average silhouette_score is : 0.7722709127556407 For no of clusters = 3 The average silhouette_score is : 0.8307470737845413 For no of clusters = 4 The average silhouette_score is : 0.6782013483149748 For no of clusters = 5 The average silhouette_score is : 0.5220013897800627 For no of clusters = 6 The average silhouette_score is : 0.3453103523071251

**References:**

https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html

https://en.wikipedia.org/wiki/Silhouette_(clustering)

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the **Machine Learning Foundation Course** at a student-friendly price and become industry ready.