Open In App

An Introduction to Grammar of Graphics for Python

Improve
Improve
Like Article
Like
Save
Share
Report

A grammar of graphics is basically a tool that enables us to describe the components of a given graphic. Basically, what this allows us to see beyond the named graphics, (scatter plot, to name one) and to basically see the underlying statistics behind it. The grammar of graphics was originally introduced by Leland Wilkinson in the 1990s and was popularized by Hadley Wickham with ggplot.

Components of Grammar of graphics

Typically, to build or describe any visualization with one or more dimensions, we can use the components as follows. 

  • Data 
    Data is an essential component of graphical grammar. After all, it contains all the information that we need to visualize. Therefore, it is important to know what is the format of the data, and what information we are working with.
  • Layer 
    Basically, a layer is something that you can relate to in real life as well. We can think of layers as a transparent sheet containing a graphic, which can be arranged and combined in a variety of ways.
  • Geom 
    The visual display of geom is known as geom. A geom could be a line, point, or even a bar, pie, etc. We can display a lot of information by “layering” geoms.
  • Scaling data 
    It is very useful to re-scale our data. Scaling data does not change the data, as per say, it just changes the viewpoint of the dataset. 

This grammar of graphics was first introduced in R, using ggplot and ggplot2. Considering its success in the past, it is also been introduced in Python as plotnine. 

Python binding

plotnine is an implementation/binding of a grammar of graphics in Python. It is based on ggplot2. So, basically, if you’re familiar with R programming and ggplot2, chances are that you would catch up with plotnine in almost no time. There are only 2 noticeable changes in ggplot2 and plotnine. 

  • In R, a plus sign indicates the code/instruction to continue to the next line. However, if we do the same thing in python, it throws an exception. To cover this, in plotnine, the expression before the plus sign is enclosed in braces and so, could be used like that.
  • The column name must be strings. This is more likely a feature of R, where you could pass the column name as a function argument without enclosing it in quotes. However, in Python, if the word is not enclosed in double quotes or single quotes, it would treat the word as a variable. 

Installation

This module does not come built-in with Python. To install this module type the below command in the terminal.  

pip install plotnine 

Example 1:  

Python3




import pandas as pd
from plotnine import * 
  
  
# load dataset 
dataset = pd.read_csv("dataset.csv"
  
# ggplot is to plot the given data
(ggplot(dataset, aes(x = "area_0", y = "area_1"))+
    geom_point()
)
  
# aes contains parameters which work 
# as x-axis and y-axis for the given plot
# geom.point() makes the data entries as points


Output: 

Example 2: 

Python3




import pandas as pd
from plotnine import * 
  
  
# load dataset 
dataset = pd.read_csv("dataset.csv"
  
(ggplot(dataset, aes(x = "area_0", y = "area_1"))+
    geom_point(color = "label", alpha = 0.7,
               size = 0.5)
)


Output: 

 



Last Updated : 20 Mar, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads