Open In App

Box plot visualization with Pandas and Seaborn

Last Updated : 08 Sep, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

Box Plot is the visual representation of the depicting groups of numerical data through their quartiles. Boxplot is also used for detect the outlier in data set. It captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplot summarizes a sample data using 25th, 50th and 75th percentiles. These percentiles are also known as the lower quartile, median and upper quartile.

A box plot consist of 5 things.

  • Minimum
  • First Quartile or 25%
  • Median (Second Quartile) or 50%
  • Third Quartile or 75%
  • Maximum

To download the dataset used, click here.

Draw the box plot with Pandas:

One way to plot boxplot using pandas dataframe is to use boxplot() function that is part of pandas library.




# import the required library 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
% matplotlib inline
  
  
# load the dataset
df = pd.read_csv("tips.csv")
  
# display 5 rows of dataset
df.head()   


Boxplot of days with respect total_bill.




df.boxplot(by ='day', column =['total_bill'], grid = False)



 
Boxplot of size with respect tip.




df.boxplot(by ='size', column =['tip'], grid = False)



 
Draw the boxplot using seaborn library:

Syntax :
seaborn.boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, notch=False, ax=None, **kwargs)

Parameters:
x = feature of dataset
y = feature of dataset
hue = feature of dataset
data = dataframe or full dataset
color = color name

Let’s see how to create the box plot through seaborn library.

Information about “tips” dataset.




# load the dataset
tips = sns.load_dataset('tips')
  
tips.head()


Boxplot of days with respect total_bill.




# Draw a vertical boxplot grouped 
# by a categorical variable:
sns.set_style("whitegrid")
  
sns.boxplot(x = 'day', y = 'total_bill', data = tips)


Let’s take the first box plot i.e, blue box plot of the figure and understand these statistical things:

  • Bottom black horizontal line of blue box plot is minimum value
  • First black horizontal line of rectangle shape of blue box plot is First quartile or 25%
  • Second black horizontal line of rectangle shape of blue box plot is Second quartile or 50% or median.
  • Third black horizontal line of rectangle shape of blue box plot is third quartile or 75%
  • Top black horizontal line of rectangle shape of blue box plot is maximum value.
  • Small diamond shape of blue box plot is outlier data or erroneous data.


Previous Article
Next Article

Similar Reads

KDE Plot Visualization with Pandas and Seaborn
Kernel Density Estimate (KDE) plot, a visualization technique that offers a detailed view of the probability density of continuous variables. In this article, we will be using Iris Dataset and KDE Plot to visualize the insights of the dataset. What is KDE Plot?KDE Plot described as Kernel Density Estimate is used for visualizing the Probability Den
4 min read
Data Visualization with Seaborn Line Plot
Prerequisite: SeabornMatplotlib Presenting data graphically to emit some information is known as data visualization. It basically is an image to help a person interpret what the data represents and study it and its nature in detail. Dealing with large scale data row-wise is an extremely tedious task, hence data visualization serves as an ideal alte
4 min read
Data visualization with Pairplot Seaborn and Pandas
Data Visualization is the presentation of data in pictorial format. It is extremely important for Data Analysis, primarily because of the fantastic ecosystem of data-centric Python packages. And it helps to understand the data, however, complex it is, the significance of data by summarizing and presenting a huge amount of data in a simple and easy-
3 min read
Plot Multiple Histograms On Same Plot With Seaborn
Histograms are a powerful tool for visualizing the distribution of data in a dataset. When working with multiple datasets or variables, it can be insightful to compare their distributions side by side. Seaborn, a python data visualization package offers powerful tools for making visually appealing maps and efficient way to plot multiple histograms
3 min read
Understanding different Box Plot with visualization
Let's see how can boxplot be useful in different ways. Loading Libraries import numpy as np import pandas as pd import matplotlib.pyplot as plt Preparing Data spread = np.random.rand(50) * 100 center = np.ones(25) * 50 flier_high = np.random.rand(10) * 100 + 100 flier_low = np.random.rand(10) * -100 data = np.concatenate((spread, center, flier_high
2 min read
Creating A Time Series Plot With Seaborn And Pandas
In this article, we will learn how to create A Time Series Plot With Seaborn And Pandas. Let's discuss some concepts : Pandas is an open-source library that's built on top of NumPy library. It's a Python package that gives various data structures and operations for manipulating numerical data and statistics. It's mainly popular for importing and an
4 min read
Data Visualization with Python Seaborn
Data Visualization is the presentation of data in pictorial format. It is extremely important for Data Analysis, primarily because of the fantastic ecosystem of data-centric Python packages. And it helps to understand the data, however, complex it is, the significance of data by summarizing and presenting a huge amount of data in a simple and easy-
9 min read
Logarithmic Scaling in Data Visualization with Seaborn
A wide range of libraires like Seaborn built on top of Matplotlib offers informative and attractive statistical graphics. However, the ability to scale axes is considered one of the essential features in data visualization, particularly when dealing with datasets that span multiple orders of magnitude. In this article, the process of how to log sca
4 min read
Time Series Plot or Line plot with Pandas
Prerequisite: Create a Pandas DataFrame from Lists Pandas is an open-source library used for data manipulation and analysis in Python. It is a fast and powerful tool that offers data structures and operations to manipulate numerical tables and time series. Examples of these data manipulation operations include merging, reshaping, selecting, data cl
6 min read
Pandas Scatter Plot – DataFrame.plot.scatter()
A Scatter plot is a type of data visualization technique that shows the relationship between two numerical variables. For plotting to scatter plot using pandas there is DataFrame class and this class has a member called plot. Calling the scatter() method on the plot member draws a plot between two variables or two columns of pandas DataFrame. Synta
2 min read