Open In App

How To Make Stripplot with Jitter in Altair Python?

Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisites: Altair

Altair is a statistical data visualization library in python which is based on Vega and Vega-Lite visualization grammars. A Stripplot is used for graphical data analysis.  It is a simple plot of response values in a sorted order along a single axis. The strip plot consists of 2 distinct axes (X, Y). The strip plots provide an alternative for the histogram and other density-based plots and are often used with small datasets. 

A simple strip plot is used for plotting the data as points, which may not be very useful to us. To make the simple stripplot more cultivate we add random jitter. Jitter in simple words is adding a small amount of variability(horizontal or vertical) to the data to ensure all data points are visible.

Approach:

  • Import Libraries
  • Import or create data
  • Create a simple Stripplot using Altair
  • Add jitter variable to the Axis
  • Modify the values of different attributes for better visualization (optional).
  • Display plot

Function Used

calculate_transform() allows the user to define new fields in the dataset which are calculated from other fields using an expression.

Syntax:

calculate_transform(<some_expression>)

Various implementation using above approach is given below

Example 1: 

In this program, we will use the tip dataset to study the amount of money paid as tip during Lunch time and Dinner time.

Python3




#import libraries
import seaborn
import altair as alt
  
  
# Getting data
tip = seaborn.load_dataset('tips')
  
# plotting the stripplot
stripplot = alt.Chart(tip).mark_circle(size=14).encode(
    # X-axis jitter Vertical
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('tip:Q',
            scale=alt.Scale()),
    color=alt.Color('time:N', legend=None),
    column=alt.Column(
        'time:N',
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())')
stripplot


Output:

Example 2:

This program deals with study of maximum temperature during different weather conditions in the region of Seattle using stripplot.

Python3




#import libraries
import altair as alt
from vega_datasets import data
  
# Getting data
weather = data.seattle_weather()
  
# plotting the stripplot
stripplot = alt.Chart(weather).mark_circle(size=14).encode(
    x=alt.X(
        'jitter:Q',
        title=None,
        axis=alt.Axis(ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    y=alt.Y('temp_max:Q',
            scale=alt.Scale(
                domain=(-1, 40))),
    color=alt.Color('weather:N', legend=None),
    column=alt.Column(
        'weather:N',
        header=alt.Header(
            labelFontSize=16,
            labelAngle=0,
            titleOrient='top',
            labelOrient='bottom',
            labelAlign='center',
            labelPadding=25,
        ),
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
).configure_axis(
    labelFontSize=16,
    titleFontSize=16
).properties(height=400, width=100)
stripplot


Output:

Example 3.

This plotting depicts the age and gender from a given piece of data. (Horizontal Plot)

Python3




#import libraries
import seaborn
import altair as alt
import pandas as pd
  
# Creating our own data
data = [['Tom', 10, 'Male'], ['Nick', 25, 'Male'], ['Juli', 14, 'Female'],
        ['Sarah', 30, 'Male'], ['Pulkit', 20, 'Male'], ['Ritika', 20, 'Female'],
        ['Sayantan', 60, 'Male'], ['Pam', 39, 'Female'], ['Peter', 42, 'Male'],
        ['Jenefer', 24, 'Female'], ['Tony', 29, 'Female'], ['Myler', 22, 'Female']]
df = pd.DataFrame(data, columns=['Name', 'Age', 'Gender'])
  
# plotting the stripplot Horizontal
horizontal_stripplot = alt.Chart(df, width=600, height=100).mark_circle(size=40).encode(
    y=alt.Y(
        'jitter:Q',
        title=None,
        axis=alt.Axis(ticks=True, grid=False, labels=False),
        scale=alt.Scale(),
    ),
    x=alt.X('Age:Q', scale=alt.Scale()),
    color=alt.Color('Gender:N', legend=None),
    row=alt.Row(
        'Gender:N',
        header=alt.Header(
            labelAngle=0,
            labelFontSize=16,
            titleOrient='top',
            labelOrient='left',
            labelAlign='left',
        ),
    ),
).transform_calculate(
    # Generate Gaussian jitter with a Box-Muller transform
    jitter='sqrt(-2*log(random()))*cos(2*PI*random())'
).configure_facet(
    spacing=0
).configure_view(
    stroke=None
).configure_axis(
    labelFontSize=16,
    titleFontSize=16
)
horizontal_stripplot


Output:



Last Updated : 13 Jan, 2021
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads