Open In App

# Categorical bubble plot in R

A Scatter plot is a graphical representation of two numeric variables related to each other based on the premise of Cartesian coordinate system where a point or dot is plotted at the intersection of the imaginary vertical and horizontal lines extending from the values of the X and Y axes respectively. One of the variables is represented along the X-axis which is usually the horizontal axis and the other is represented along the vertical direction, usually the Y-axis. An enhancement of the Scatter plot is the Bubble Plot where the traditional dots or points of the Scatter Plot are replaced by circles or bubbles. This allows the introduction of another numeric variable to be represented simultaneously whose value is mapped corresponding to the size of the bubble. Thus, the bubble plot allows us to map three numeric variables in the same plot.

A further extension or rather enhancement of the bubble plot is the categorical bubble plot where we can represent a categorical variable in addition to the three numeric variables that were already being mapped in the bubble plot. The working mechanism of the categorical bubble plot is that the position of the bubbles is determined by the values of two numeric variables mapped along the X and Y axes. The size of the bubble is determined by the value of the third numeric variable. In addition to all this, the color of the bubble specifies the category to which it belongs which is how the categorical variable is represented in the same plot. In this article, we are going to explore 2 approaches to plot bubble plots in R. These approaches are as mentioned below :

• using the ggplot2 library
• using the plotly library

Method 1: Using GGPLOT2

In this approach, we use the ggplot2 () library which is a very comprehensive library for rendering all types of charts and graphs. Under this approach, we add two additional parameters in the aes() function by the name of “size” the value of which is a numerical variable, and “color” which is used to represent the categorical variable. The syntax and parameters required for the bubble plot using ggplot are mentioned below.

Syntax: ggplot(name of data frame, aes(x = column name of first numeric column, y = column name of second numeric variable, size = column name of the variable to specify the size, color = column name of the categorical variable))

To specify the range of sizes of the bubbles, we use :

scale_size(range = the range of sizes of the bubbles, name = name to be displayed on top of the size legend)

Approach

• Create data set
• Import module
• Plot data frame
• Display plot

Program :

## R

 `# creating data set columns``height <- ``abs``(``rnorm``(25, 175, 10))`` ` `weight <- ``abs``(``rnorm``(25, 72, 10))`` ` `population <- ``floor``(``abs``(``rnorm``(25, 1500, 500)))`` ` `cities <- ``c``(``rep``(``"Delhi"``, 7), ``rep``(``"Mumbai"``, 6),``            ``rep``(``"Chennai"``, 6), ``rep``(``"Bengaluru"``, 6))`` ` `# creating the dataframe from the above columns``dataframe <- ``data.frame``(height, weight,``                        ``population, cities)`` ` `# importing the ggplot2 library``library``(ggplot2)`` ` `# calling the ggplot function``# the value of the first parameter is the``# name of the dataframe``# the size of the bubbles is proportional to``# the population``# each of the 4 cities is identified by separate``# colors``ggplot``(dataframe, ``aes``(x = weight, y = height,``                      ``size = population,``                      ``color = cities))+`` ` `# specifying the transparence of the bubbles``# where value closer to 1 is fully opaque and``# value closer to 0 is completely transparent``geom_point``(alpha = 0.7)+`` ` `# setting the scale of sizes of the bubbles using``# range parameter where the smallest size is 0.1``# and the largest one is 10``# name of the size legend is Population``scale_size``(range = ``c``(0.1, 10), name = ``"Population"``)+`` ` `# specifying the title for the plot``ggtitle``(``"Height and Weight Data of 4 Cities"``)+`` ` `# code to center the title which is left aligned``# by default``theme``(plot.title = ``element_text``(hjust = 0.5))`

Output: Categorical Bubble Plot using ggplot

Method 2: Using Plotly

Plotly is an R package/library that is used to design and render interactive graphs. It is built on top of the open-source Javascript library and is used to generate publication-ready graphs and charts. In case, plotly is not included in your R environment.

The syntax and parameters required for rendering categorical bubble plot using plotly are mentioned below:

Syntax :

We use plot_ly () function to generate plots using plotly.

Parameter:

• the data set ( data frame here)
• x = ~column name of the variable to be displayed on the X-axis
• y = ~column name of the variable to be displayed on the Y-axis
• text = ~name of column to be displayed when the cursor hovers above the bubble
• color = ~column name based on which the color is assigned (the categorical variable)
• size = ~column name to determine the size of the bubble
• sizes = c() to specify the ranges of the bubble sizes
• marker = list() to specify opacity as well as the unit of measurement as in diameter
• We can also specify the layout which comprises title, the grid structure and so on using the layout() function.

Approach

• Create data frame
• Import module
• Create plot
• Display plot

Program :

## R

 `# create data set column``height <- ``abs``(``rnorm``(25, 175, 10))`` ` `weight <- ``abs``(``rnorm``(25, 72, 10))`` ` `population <- ``floor``(``abs``(``rnorm``(25, 1500, 500)))`` ` `cities <- ``c``(``rep``(``"Delhi"``, 7), ``rep``(``"Mumbai"``, 6),``            ``rep``(``"Chennai"``, 6), ``rep``(``"Bengaluru"``, 6))`` ` `# creating the dataframe ``dataframe <- ``data.frame``(height, weight,``                        ``population, cities)`` ` `# calling the plotly library``library``(plotly)`` ` `# declaring the bubbleplot variable``# to render the plot``# the first parameter is the dataframe``# x and y have the column name values of the ``# corresponding variables to plotted on the axes``# size holds the column name value to determine the ``# size of the bubbles``# color holds the value of the categorical cities``# variable``# sizes specify the range of sizes from 10 to 50``# marker parameter specifies the opacity and the ``# sizemode``bubbleplot <- ``plot_ly``(dataframe, x = ~weight, y = ~height,``                      ``text = ~cities, size = ~population,``                      ``color = ~cities, sizes = ``c``(10, 50),``                      ``marker =``                      ``list``(opacity = 0.7,``                           ``sizemode = ``"diameter"``))`` ` `# code to generate the layout``# specifying the title``# the grids are displayed``bubbleplot <- bubbleplot%>%layout``(title = ``"Height and Weight Data of 4 Cities"``,`` ``xaxis = ``list``(showgrid = ``TRUE``),`` ``yaxis = ``list``(showgrid = ``TRUE``))`` ` `# calling the bubbleplot variable to render``# the plot``bubbleplot`

Output : Categorical Bubble Plot generated using Plotly