Box Plot using Plotly in Python

Plotly is a Python library which is used to design graphs, especially interactive graphs. It can plot various graphs and charts like histogram, barplot, boxplot, spreadplot and many more. It is mainly used in data analysis as well as financial analysis. plotly is an interactive visualization library.

Box Plot

A box plot is a demographic representation of numerical data through their quartiles. The end and upper quatiles are represented in box,  while the median (second quartile) is notable by a line inside the box. Plotly.express is convenient,high-ranked interface to plotly which operates on variet of data and produce a easy-to-style figure.Box are much beneficial for comparing the groups of data. Box plot divide approx. 25% of section data into sets which helps ion quickly identifying  values, the dispersion of the data set, and signs of skewness.

Syntax: plotly.express.box(data_frame=None, x=None, y=None, color=None, facet_row=None, facet_col=None, facet_col_wrap=0, hover_name=None, hover_data=None, custom_data=None, animation_frame=None, animation_group=None, category_orders={}, labels={}, color_discrete_sequence=None, color_discrete_map={}, orientation=None, boxmode=None, log_x=False, log_y=False, range_x=None, range_y=None, points=None, notched=False, title=None, template=None, width=None, height=None)

Parameters:

Name Description
data_frame This argument needs to be passed for column names (and not keyword names) to be used. Array-like and dict are tranformed internally to a pandas DataFrame. Optional: if missing, a DataFrame gets constructed under the hood using the other arguments.
x  Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the x axis in cartesian coordinates. Either x or y can optionally be a list of column references or array_likes, in which case the data will be treated as if it were ‘wide’ rather than ‘long’.
y  Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to position marks along the y axis in cartesian coordinates. Either x or y can optionally be a list of column references or array_likes, in which case the data will be treated as if it were ‘wide’ rather than ‘long’.
color Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign color to marks.
facet_row Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to facetted subplots in the vertical direction.
facet_col Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to facetted subplots in the horizontal direction.
facet_col_wrap  Maximum number of facet columns. Wraps the column variable at this width, so that the column facets span multiple rows. Ignored if 0, and forced to 0 if facet_row or a marginal is set.
hover_name Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like appear in bold in the hover tooltip.
hover_data Either a list of names of columns in data_frame, or pandas Series, or array_like objects or a dict with column names as keys, with values True (for default formatting) False (in order to remove this column from hover information), or a formatting string, for example ‘:.3f’ or ‘|%a’ or list-like data to appear in the hover tooltip or tuples with a bool or formatting string as first element, and list-like data to appear in hover as second element Values from these columns appear as extra data in the hover tooltip.
custom_data  Either names of columns in data_frame, or pandas Series, or array_like objects Values from these columns are extra data, to be used in widgets or Dash callbacks for example. This data is not user-visible but is included in events emitted by the figure (lasso selection etc.)
animation_frame Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to assign marks to animation frames.
animation_group Either a name of a column in data_frame, or a pandas Series or array_like object. Values from this column or array_like are used to provide object-constancy across animation frames: rows with matching `animation_group`s will be treated as if they describe the same object in each frame.
category_orders  By default, in Python 3.6+, the order of categorical values in axes, legends and facets depends on the order in which these values are first encountered in data_frame (and no order is guaranteed by default in Python below 3.6). This parameter is used to force a specific ordering of values per column. The keys of this dict should correspond to column names, and the values should be lists of strings corresponding to the specific display order desired.
labels By default, column names are used in the figure for axis titles, legend entries and hovers. This parameter allows this to be overridden. The keys of this dict should correspond to column names, and the values should correspond to the desired label to be displayed.
color_discrete_sequence   Strings should define valid CSS-colors. When color is set and the values in the corresponding column are not numeric, values in that column are assigned colors by cycling through color_discrete_sequence in the order described in category_orders, unless the value of color is a key in color_discrete_map. Various useful color sequences are available in the plotly.express.colors submodules, specifically plotly.express.colors.qualitative.
color_discrete_map  String values should define valid CSS-colors Used to override color_discrete_sequence to assign a specific colors to marks corresponding with specific values. Keys in color_discrete_map should be values in the column denoted by color. Alternatively, if the values of color are valid colors, the string ‘identity’ may be passed to cause them to be used directly.
orientation (default ‘v’ if x and y are provided and both continous or both categorical, otherwise ‘v’`(‘h’) if `x`(`y) is categorical and y`(`x) is continuous, otherwise ‘v’`(‘h’) if only `x`(`y) is provided)
boxmode One of ‘group’ or ‘overlay’ In ‘overlay’ mode, boxes are on drawn top of one another. In ‘group’ mode, baxes are placed beside each other.
log_x If True, the x-axis is log-scaled in cartesian coordinates.
log_y  If True, the y-axis is log-scaled in cartesian coordinates.
range_x If provided, overrides auto-scaling on the x-axis in cartesian coordinates.
range_y  If provided, overrides auto-scaling on the y-axis in cartesian coordinates.
points One of ‘outliers’, ‘suspectedoutliers’, ‘all’, or False. If ‘outliers’, only the sample points lying outside the whiskers are shown. If ‘suspectedoutliers’, all outlier points are shown and those less than 4*Q1-3*Q3 or greater than 4*Q3-3*Q1 are highlighted with the marker’s ‘outliercolor’. If ‘outliers’, only the sample points lying outside the whiskers are shown. If ‘all’, all sample points are shown. If False, no sample points are shown and the whiskers extend to the full range of the sample.
notched If True, boxes are drawn with notches.
title The figure title.
template The figure template name (must be a key in plotly.io.templates) or definition.
width The figure width in pixels.
height The figure height in pixels.

Example 1: Using Iris Dataset



Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.iris()
  
fig = px.box(df, x="sepal_width", y="sepal_length")
  
fig.show()

chevron_right


Output:

Example 2: Using Tips Dataset

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.tips()
  
fig = px.box(df, x = "sex", y="total_bill")
fig.show()

chevron_right


Output:

In the above examples, let’s take the first box plot of the figure and understand these statistical things:



  • Bottom horizontal line of box plot is minimum value
  • First horizontal line of rectangle shape of box plot is First quartile or 25%
  • Second horizontal line of rectangle shape of box plot is Second quartile or 50% or median.
  • Third horizontal line of rectangle shape of box plot is third quartile or 75%
  • Top horizontal line of rectangle shape of box plot is maximum value.
  • Small diamond shape of blue box plot is outlier data or erroneous data.

Changing Algorithm for Quartiles

The algorithm to choose quartiles can also be selected in plotly. It is computed by using linear algorithm by default. However, it provides two more algorithms for doing the same i.e. inclusive and exclusive.

Example 1: Using inclusive algorithm

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.tips()
  
fig = px.box(df, x = "sex", y="total_bill", points="all")
fig.update_traces(quartilemethod="inclusive")
  
fig.show()

chevron_right


Output:

Example 2: Using exclusive algorithm

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.tips()
  
fig = px.box(df, x = "sex", y="total_bill", points="all")
fig.update_traces(quartilemethod="exclusive")
  
fig.show()

chevron_right


Output:

Showing the underlying data

Underlying data can be shows using the points arguments. The value of this argument can be of three types – 



  • all for all points
  • outliers for outliers only
  • false for none of the above

Example 1: Passing all as argument

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.tips()
  
fig = px.box(df, x = "sex", y="total_bill", points="all")
fig.show()

chevron_right


Output:

Example 2:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.tips()
  
fig = px.box(df, x = "sex", y="total_bill", points="outliers")
fig.show()

chevron_right


Output:

Styling box plot

Boxplot comes with various styling options. Let’s see one such option in the below example.

Example:

Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

import plotly.express as px
  
df = px.data.tips()
  
fig = px.box(df, x = "sex", y="total_bill", points="all", notched=True)
fig.update_traces(quartilemethod="exclusive")
  
fig.show()

chevron_right


Output:




My Personal Notes arrow_drop_up


If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.