Hierarchically-clustered Heatmap in Python with Seaborn Clustermap
Seaborn is an amazing visualization library for statistical graphics plotting in Python. It provides beautiful default styles and color palettes to make statistical plots more attractive. It is built on the top of matplotlib library and also closely integrated into the data structures from pandas.
What is Clustering?
Clustering is basically grouping data based on relationships among the variables in the data. Clustering algorithms help in getting structured data in unsupervised learning. The most common types of clustering are shown below.
Here we are going to see hierarchical clustering especially Agglomerative(bottom-up) hierarchical clustering. In Agglomerative clustering, we start with considering each data point as a cluster and then repeatedly combine two nearest clusters into larger clusters until we are left with a single cluster. The graph we plot after performing agglomerative clustering on data is called Dendrogram.
Plotting Hierarchically clustered Heatmaps
Coming to the heat map, it is a graphical representation of data where values are represented using colors. Variation in the intensity of color depicts how data is clustered or varies over space.
The clustermap() function of seaborn plots a hierarchically-clustered heat map of the given matrix dataset. It returns a clustered grid index.
Below are some examples which depict the hierarchically-clustered heat map from a dataset:
In the Flights dataset the data(Number of passengers) is clustered based on month and year:
The legend to the left of the cluster map indicates information about the cluster map e.g bright color indicates more passengers and dark color indicates fewer passengers.
Here we have changed the colors of the cluster map.