A Barplot is a graph that represents the relationship between a categoric and a numeric feature. Many rectangular bars correspond to each category of the categoric feature and the size of these bars represents the corresponding value. Using grouped bar plots, we can study the relationship between more than two features.
In Python, we can plot a barplot either using the Matplotlib library or using the seaborn library, which is a higher-level library built on Matplotlib and it also supports pandas data structures. In this article, we have used seaborn.barplot() function to plot the grouped bar plots.
Another important aspect of data visualization using bar plots is, using annotations i.e adding text for a better understanding of the chart. This can be achieved by using the annotate() function in pyplot module of matplotlib library as explained in the below steps.
Step 1: Importing the libraries and the dataset used. Here we have used the Titanic dataset, which is inbuilt with seaborn.
We will plot a grouped bar plot to analyze the average age and count of travelers on titanic, gender-wise in different ticket classes. For that, we need to transform the dataset.
Step 2: Transforming the dataset using grouping and aggregation on the original dataset.
Explanation: Grouped bar plots require at least two categorical features and a numerical feature. Here, we have filtered out the ‘class’ feature to categorize and the ‘sex’ feature to group the bars using pandas.Dataframe.groupby() function. Then we have aggregated the mean age and count for each group using pandas.core.groupby.DataFrameGroupBy.agg(). Previous operations resulted in a multi-index dataframe, hence we reset the index to obtain the dataset shown below.
Step 3: Now, we plot a simple Barplot using the transformed dataset using seaborn.barplot() function.
Note that, We have used ‘hue’ keyword argument to group the bars based on ‘sex’ feature.
Step 4: Annotating the bars
Explanation: In the above code, we have used the ‘patches’ attribute of the seaborn plot object to iterate over each bar. We have calculated the height, coordinates, and put text using the annotate function for each bar.
Step 5: Since each bar represents age and putting decimal doesn’t make its value sensible. We will customize our text by rounding off to the nearest integer and then using the format() function as shown in the code below.
Also, by changing the coordinates we have shifted our text inside the bar.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course