In Machine Learning and Data Science, one of the toughest jobs is to understand the raw data and make it fit for applying different models over it to make predictions. For understanding purpose, we make use of different processes like checking statistical figures such as mean, median, mode, finding the relation between features, looking at the distribution of certain features, etc. This all things are often clubbed under the term Exploratory Data Analysis (EDA). According to experienced ML and DS practitioners, EDA is one of the most valuable tasks and its importance cannot be overlooked when it comes to analyzing the dataset. It helps the practitioners to choose the appropriate data pre-processing techniques.
Checking the statistical figures is well and good but what’s better than visualizing these stats pictorially, it often said that pictures are a more powerful tool for understanding rather than just numerical figures. We can come up with the clearer understanding of the data visually by making different charts. In machine learning and data science, we use different types of charts/plots in order to visualize different patterns among the features some of which are: Histogram, Bar plots, Box plots, Pair plots, Pie charts, Violin plots, etc. But writing codes manually for all these plots/graphs can seem to be a tiring process and also we are more prone to inaccuracies and errors/bugs in the code. So, here comes our lifesaver, a python package that does the tasks of plotting these graphs easily and efficiently without any such bugs in the code.
In this article, we will be reading about making interactive plots/charts using pywedge python package. Pywedge is an open-source python package that can be used for automating most of the machine learning problem-solving task. It also offers us features to plot interactive graphs using just a few lines of code.
Pywedge library has a make_charts method that offers us to make 8 different types of charts which are named as follows:
- Scatter plot
- Pie chart
- Bar Plot
- Violin plot
- Box plot
- Distribution plot
- Correlation plot
Let’s see an example of how can we use pywedge library to draw interactive plots:
Importing libraries and loading the dataset:
After loading the dataset, we will look at how the dataset looks like, this can be seen using the head method to print the top 5 rows in the dataset:
Shape of our dataframe is: (1599, 12)
So, as we can notice we have in total 1599 rows and 11 features in the dataset along with one target feature named ‘quality’.
It is always better to check out some stats about our dataset and we can do it using the describe() method as:
By looking just at the stats, we can infer
- The mean value is greater than the median value in each of the columns, for example, mean value of pH is 3.311113 whereas median value is 3.310000. Similarly, the mean of alcohol is 10.422 as compared to median which is 10.20.
- We can notice a large difference between 75th percentile value and max values of predictors “residual sugar”, “free sulfur dioxide”, “total sulfur dioxide”. This indicates that some values of these 3 variables lie much farther from the general range of values( up to 75th %tile). Thus, we can conclude that there are extreme values i.e., Outliers in our dataset.
Renaming of column and splitting of the dataset:
Using Pywedge library for making charts:
The ‘make_charts’ method on execution yields out a window giving us 8 different types of chart options and we can select our required features to plot the graph and visualize the results.
Making a scatter plot:
Making a Violin plot:
Making a Distribution plot:
Making a Histogram:
Making a Correlation plot:
Thus, we can see how efficiently we can plot these several graphs using just a few lines of code using the pywedge package without explicitly writing codes for all these graphs.