Grouped Bar Graphs and Facet_Wrap in R
Last Updated :
20 Apr, 2023
In this article, we are going to learn how to define data when using ggsignif with grouped bar graphs and facet_wrap in R programming language.
ggplot2 is a popular R Language package used for data visualization. It allows users to create a wide range of plots and graphs, including bar graphs. However, adding statistical significance bars to bar graphs can be a bit tricky. That’s why ggsignif, another R package, comes in handy. ggsignif provides an easy way to add significance bars to bar graphs created with ggplot2. In this article, we will explore how to define data when using ggsignif with grouped bar graphs and facet_wrap in R.
Installing required packages
Execute the below commands to install the dplyr, ggplot2, and ggsignif packages in R respectively.
install.packages("dplyr")
install.packages("ggplot2")
install.packages("ggsignif")
Prepare the Data
We will start by creating some sample data to work with. We will use the mtcars dataset which is an inbuilt dataset in R. The mtcars dataset contains information about various car models, including the number of cylinders, horsepower, and miles per gallon (mpg).
R
library (dplyr)
data (mtcars)
cyl_gear_data <- mtcars %>% group_by (cyl,
gear) %>% summarize (mean_mpg = mean (mpg))
|
Output: In the code above, firstly we have imported the dplyr package and then used dplyr package to group the mtcars dataset by the number of cylinders and gear. We then calculate the mean mpg for each group.
Create a Grouped Bar Graph
Next, we will create a grouped bar graph using the ggplot2 package. We will use the geom_bar() function to create the bar graph and facet_wrap() function to create separate plots for each number of cylinders.
R
library (ggplot2)
ggplot (cyl_gear_data,
aes (x = factor (gear),
y = mean_mpg, fill = factor (cyl))) +
geom_bar (stat = "identity" ,
position = "dodge" ) +
labs (x = "Gear" , y = "Mean MPG" ) +
ggtitle ( "Mean MPG by Gear and Number of Cylinders" ) +
theme_bw () +
theme (plot.title = element_text (hjust = 0.5)) +
facet_wrap (~cyl, ncol = 2)
|
Explanation: In the code above, firstly we are importing ggplot2 library and then we are creating a bar graph with ggplot2. We are using aes() to define the x-axis (gear), y-axis (mean_mpg), and fill (cyl). We are then using geom_bar() to create the bar graph with stat = “identity” and position = “dodge”. We are adding axis labels and a plot title using labs and ggtitle. We are also using theme_bw to set a black-and-white theme and facet_wrap to create separate plots for each number of cylinders.
Output:
Add Significance Bars with ggsignif
Finally, we will add significance bars to our grouped bar graph using ggsignif package. We will use the geom_signif() function to add the significance bars.
The significance bars are added to indicate the statistical significance of differences between groups within each facet. The geom_signif() function is used to add these bars. This function takes as input the x and y positions of the bars to which the significance bars should be added, along with the p-values for the statistical significance. It also allows for customization of the appearance of the bars, such as color, size, and style. By adding these significance bars to our grouped bar graph, we can more effectively communicate the differences between groups to our audience.
Example 1:
In this example, we are adding the geom_signif() function to our existing plot. We are defining the comparisons we want to test using the list argument. In this case, we are comparing the mean mpg of cars with 4 cylinders and 3 gears to those with 4 cylinders and 5 gears and labeling it with “*” and comparing the mean mpg of cars with 4 cylinders and 5 gears to those with 5 cylinders and 5 gears and labeling it with “”. We are also setting the text size and vertical justification of the annotations.
R
library (ggsignif)
ggplot (cyl_gear_data, aes (x = factor (gear), y = mean_mpg, fill = factor (cyl))) +
geom_bar (stat = "identity" , position = "dodge" ) +
labs (x = "Gear" , y = "Mean MPG" ) +
ggtitle ( "Mean MPG by Gear and Number of Cylinders" ) +
theme_bw () +
facet_wrap (~cyl, ncol = 2) +
geom_signif (comparisons = list ( c ( "4" , "3" ), c ( "4" , "5" )),
annotations = c ( "*" , "" ),
textsize = 5, vjust = -0.5)
|
Output:
Example 2:
We will be adding significance levels for the first graph as well. We specified the comparisons and annotations for each plot, as well as adjusted the text size, vertical justification, and horizontal positioning of the bars and annotations as necessary.
R
library (ggsignif)
ggplot (cyl_gear_data, aes (x = factor (gear), y = mean_mpg, fill = factor (cyl))) +
geom_bar (stat = "identity" , position = "dodge" ) +
labs (x = "Gear" , y = "Mean MPG" ) +
ggtitle ( "Mean MPG by Gear and Number of Cylinders" ) +
theme_bw () +
facet_wrap (~cyl, ncol = 2) +
geom_signif (comparisons = list ( c ( "4" , "3" ), c ( "4" , "5" ), c ( "3" , "5" )),
annotations = c ( "***" , "**" , "*" ),
textsize = 5, vjust = -0.5)
|
Output:
Add Significance Level & Stars to the box plot
The process of adding significance levels and stars to a box plot involves visually indicating the statistical significance of differences between groups within the plot. In the below example, the ggpubr package is used to add significance levels and stars to a box plot of petal length by species in the iris dataset. We can install ggpubr package using below command
install.packages("ggpubr")
Specifically, the stat_compare_means() function is used to add the significance levels and stars. This function computes the specified statistical test (in this case, a t-test) and adds a label to the plot with the resulting p-value. The comparisons argument specifies which groups to compare, and the method argument specifies the statistical test to use. The appearance of the significance level and stars is controlled by arguments such as size, vjust, and tip.length.
R
library (ggplot2)
library (ggpubr)
data (iris)
p <- ggplot (iris, aes (x = Species,
y = Petal.Length)) +
geom_boxplot (fill = "lightblue" ,
color = "black" ) +
ylab ( "Petal Length" )
p + stat_compare_means (comparisons = list ( c ( "versicolor" ,
"setosa" ),
c ( "virginica" ,
"versicolor" )),
label = "p.format" ,
method = "t.test" ,
size = 8,
vjust = -1.5,
tip.length = 0.01)
|
Explanation: In the above code, we first load the ggplot2 and ggpubr packages. We then load a pre-defined dataset, in this case, the iris dataset. We create a boxplot of the Petal.Length variable grouped by Species. We then use the stat_compare_means() function from the ggpubr package to add significance level and stars to the plot. The comparisons argument specifies the pairwise comparisons to be made (in this case, comparing the mean Petal.Length of versicolor to setosa, and virginica to versicolor). The method argument specifies the statistical test to be used (in this case, a t-test). The label argument specifies the format of the p-value labels. Finally, we specify the appearance of the significance level and stars using the size, vjust, and tip.length arguments.
Output:
Share your thoughts in the comments
Please Login to comment...