How to Save Time with Data Visualization using Stack in R with ggplot2

Last Updated : 21 Aug, 2023

The widely used R package ggplot2 is used to produce beautiful and efficient data visualisations. Here are some pointers for speeding up data visualisation using the “stack” feature of ggplot2:

Select the pertinent information: Make sure the data you plan to use in your visualisation is appropriate. You will need data that contains information on each group if you want to compare two or more groups.

Utilize the “stack” function: In ggplot2, use the “position_stack()” function to stack data. As a result, the data will be stacked according to the y-axis values.

Adapt your visualisation: Using ggplot2’s additional functions, you can adapt your visualisation once your data has been stacked. For instance, you can alter the bars’ colour, add labels, and adjust the axis labels.

Save time with templates: To save time, you can create templates for your visualizations. This way, you can quickly create new visualizations using the same style and format.

Here is an example code snippet to create a stacked bar chart with ggplot2:

R

library(ggplot2)
library(dplyr)
library(tidyr)
 
# load the mtcars dataset
data(mtcars)
 
# create a new variable with the average of mpg and wt columns
mtcars$avg_mpg_wt <- rowMeans(mtcars[c("mpg", "wt")])
 
# stack the data and calculate proportions
mtcars_stack <- mtcars %>%
  pivot_longer(cols = c("mpg", "wt", "avg_mpg_wt"), names_to = "variable", values_to = "value") %>%
  group_by(cyl, variable) %>%
  summarise(total = sum(value),
            proportion = value / total) %>%
  ungroup()
 
# create a stacked bar chart with facets
ggplot(mtcars_stack, aes(x = cyl, y = proportion, fill = variable)) +
  geom_bar(stat = "identity", color = "black", size = 0.25) +
  scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73")) +
  labs(title = "Stacked Bar Chart: Average mpg and wt by Cylinder Count",
       x = "Cylinder Count",
       y = "Proportion",
       fill = "Variable") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
        axis.title = element_text(face = "bold", size = 12),
        legend.position = "bottom",
        legend.title = element_text(face = "bold", size = 12),
        legend.text = element_text(size = 10)) +
  guides(fill = guide_legend(nrow = 1))

The mtcars dataset, a built-in dataset in R that provides details on various automobile models, is the first dataset that is loaded by the code. The next step is to make avg_mpg_wt, a new variable that represents the average of the mpg and wt columns for each row in the dataset.
the pivot_longer function from the tidyr package is used to stack the data. The names_to and values_to arguments indicate the names of the new columns that are formed during the stacking operation, and the cols argument defines which columns should be stacked. Three columns make up the resultant dataset: cyl, variable, and value.
The data is then grouped by category using the group_by() function from the dplyr package, and the total and proportion columns are added using the mutate() function. Total is determined by adding the values for each category, and proportion is determined by dividing the sum of values for each category and group by the total. The grouping from the data is finally removed using the ungroup().
Finally, the code uses the ggplot function from the ggplot2 package to generate a stacked bar chart with facets. The variables for the x and y axes as well as the fill colour are specified using the aes function. The stacked bars are produced by the geom_bar function, and the facets based on the variable column are produced by the facet_wrap function. The labs function is used to add a title and axis labels to the chart, and the scale_fill_manual function is used to define the colours for the various variables.
Output:

Example 2:

R

library(ggplot2)
library(dplyr)
library(tidyr)
 
# Load iris dataset
data(iris)
 
# Calculate the mean values of each variable for each species
iris_means <- iris %>%
  group_by(Species) %>%
  summarise_if(is.numeric, mean) %>%
  pivot_longer(cols = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width), 
               names_to = "variable", values_to = "value")
 
# Stack the data and calculate proportions
iris_stack <- iris_means %>%
  group_by(Species, variable) %>%
  summarise(total = sum(value),
            proportion = value / total) %>%
  ungroup()
 
# Create stacked bar chart
ggplot(iris_stack, aes(x = Species, y = proportion, fill = variable)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Measurements of Iris Flowers by Species",
       x = "Species",
       y = "Proportion",
       fill = "Variable") +
  theme_minimal() +
  theme(legend.position = "bottom") +
  scale_fill_manual(values = c("#619CFF", "#F8766D", "#00BA38", "#00BFC4"))

The iris dataset, which is a part of the standard R installation, is loaded by data(iris).
The tidyr package’s pivot_longer() function is used to stack the data. The cols input specifies a collection of columns that this function pivots to a long format. The names_to and values_to parameters specify the columns that will be created.
The summarise() method then computes the Total and Proportion for each combination of Species and Variable after grouping the stacked data by Species and Variable.
With ggplot, the resulting stacked data are utilised to make a stacked bar chart.().
Species is mapped to the x-axis, Proportion to the y-axis, and Variable to the fill using the aes() method. The stacked bars were made using geom_bar(stat = “identity”). Each Variable has a facet that is created using facet_wrap(), and each Variable’s colours are specified using scale_fill_manual().
The plot’s title, x-axis label, y-axis label, and fill legend title are all added using labs().
For the iris dataset, the code generates a visually appealing stacked bar chart that displays the percentage of each Variable for each Species.

Output: