Visualising ML DataSet Through Seaborn Plots and Matplotlib
Working on data can sometimes be a bit boring. Transforming a raw data into an understandable format is one of the most essential part of the whole process, then why to just stick around on numbers, when we can visualize our data into mind-blowing graphs which are up for grabs in python. This article will focus on exploring plots which could make your preprocessing journey, intriguing.
Seaborn and Matplotlib provide us with numerous alluring graphs through which one can easily analyze weak points, explore data with a deeper understanding and eventually end up getting a great insight into data and gaining the highest accuracy after training it through different algorithms.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
Let’s Have A Glance Through Our Dataset : The Dataset (36 rows) contains 6 Features And 2 Classes (Survived = 1, Not Survived = 0 ) Based on which we’ll plot certain graphs. Link of the dataset – Click Here To Get Complete Dataset
1. KDE PLOT : Okay So after having a glance through the dataset we can have a question. Which Age Group Has Maximum No. Of People? To answer this question we need visuals where Our KDE Plot comes into the picture, it is simply a density plot. So let’s start with importing required libraries and use its functions to plot the graph.
2. So now we have a clear picture of how the Count Of People vs Age-Group is distributed, here we can see that the age group 20-40 has maximum count so let’s check it.
3. Digging deeper into visuals, to know about the variation in Fair Vs Age, what is the relation between them, let’s have a look using a different kind of kdeplot simply now there’ll be bivariate densities, we will just add the Y Variable(Fair).
4. After Studying this plot a bit, we see that the intensity of the color is maximum between the age group 20-30 and precisely these have a fair between 100-200, let’s check it
5. We can also add a histogram to kdeplot just by using
distplot() module of seaborn :
6. Well. If one wants to know about the Male Vs Female Proportion, We can plot the same in KDE itself :
7. As We can see from the plot there is an increase in the count after Age 12 till Age 40, let’s check for the same
8. VIOLIN PLOT : We have talked much about the features, now let’s talk about Survival Rate Dependancy On Features. For This, We will use a classic Violin Plot, as the name suggests it portrays the same visuals as that of the musical waves of a violin. Basically A Violin Plot is used to visualize the distribution of the data and its probability density.
What is the Relation Between Survival Rate And Age? Let’s Visually Analyze It :
Explanation : The white dot we see in the plot is median and thick black bar in the center represents the interquartile
range.The thin black line extended from it represents the upper (max) and lower (min) adjacent values in the data.
A Quick glance show’s us that between Age[10-20] The Survival Rate is A bit higher(Survived==1).
9. Let’s plot one more for the Survival Rate Vs Gender and Age
Here an additional attribute is hue, which refers to the binary value for Survived.
10. CATPLOT : In simple terms, catplot shows frequencies (or optionally fractions or percents) of the categories of one, two, or three categorical variables.
Here sns.despine is used to remove the top and right spines from the plot, let’s have a look at it.
Here We get a clear picture of Gender Wise Survival Probability w.r.t No. Of Siblings.
11. Now, in The Dataset We See There Are Three Categories in Ticket, Which is based on Fare, Let’s Find About It (Referring This Plot I Added A Category Column For Tickets)
Using This we concluded that categories should be defined for tickets
12. Relation of the same with Survival Rate :
From this, we get a clear insight for Survival Rate Vs Fare w.r.t Category of Tickets.