Open In App

What is Box plot and the condition of outliers?

Box plot is a data visualization plotting function. It shows the min, max, median, first quartile, and third quartile. All of the things will be explained briefly. All of the property of box plot can be accessed by dataframe.column_name.describe() function.

Aspects of a box plot

Here is a well distributed data-set.






data = [0, 1, 2, 3, 4, 5, 6
df = pd.DataFrame(data, columns = ['Num'])  
df

Output:



Now plotting the data frame using box plot,




plt.figure(figsize = (10, 7)) 
  
df.boxplot() 

Explanation of the different parts of the box plot

The maximum and the minimum is the max and min value of the data-set. 50 percentile is the median of the data-set. The first quartile is the median of the data between the min to 50% and the third quartile is the median of the data between 50% to max. The outliers will be the values that are out of the (1.5*interquartile range) from the 25 or 75 percentile.

Methods of finding the values

Different Cases of Box Plot

Let us see different cases of box plots with different examples and let’s try to understand each one of them.

Description

The box plot seem useful to detect outliers but it has several other uses too. Box plots take up less space and are therefore particularly useful for comparing distributions between several groups or sets of data. It is a direct representation of the Probability Density Function which indicates the distribution of data.


Article Tags :