Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Add line for average per group using ggplot2 package in R

  • Last Updated : 03 Dec, 2021

In this article, we will discuss how to add a line for average per group in a scatter plot in the R Programming Language.

In the R Language, we can do so by creating a mean vector by using the group_by() and summarise() function. Then we can use that mean vector along with the geom_hline() function of the ggplot2 package to create a line by the mean point colored by the group.

To create a mean vector from the data frame,

Syntax:

mean <- df %>% 
group_by( <categorical-variable> ) %>% 
summarise( mean_val = mean( <quantitative-variable> )

Arguments:

  • df: determines the data frame to be used.
  • <categorical-variable>: determines the variable that is used to divide data into groups.
  • <quantitative-variable>: determines the variable whose mean is to be found.

This expression creates a vector with two columns i.e. <categorical-variable> and the mean that stores mean by category. Now, we will use this mean vector with the geom_hline() function to add a horizontal line at the mean/average of data colored by categorical variable.

Syntax:

plot + geom_hline( mean_df, aes( yintercept, col )

Arguments:

  • mean_df: determines the data frame that contains mean information.
  • yintercept: determines the variable mean column in dataframe.
  • col: determines the categorical variable by which line has to be colored.

Example 1:

Here in this example, we have created a scatter plot colored by a categorical variable. Then we have added a line colored by the same variable that goes through the mean of that category of data.

R




# load library tidyverse
library(tidyverse)
 
# create dataframe
df <- data.frame(
    group=factor(rep(c("category1", "category2","category3"),
                        each=100)),
    y=round(c(rnorm(100, mean=65, sd=5),
                  rnorm(100, mean=85, sd=5),
                 rnorm(100, mean=105, sd=5))),
    x=rnorm(300))
 
# create mean by group
mean <- df%>% group_by(group)%>%summarise(mean_val=mean(y))
 
# create ggplot scatter plot
# add horizontal line overlay at mean using geom_hline()
ggplot(data = df, aes(x= x, y=y)) +
geom_point(aes(colour = group)) +
geom_hline(data= mean, aes(yintercept = mean_val,col=group))

Output:

 

Example 2:

In this example, we have created a scatter plot colored by a categorical variable. Then we have added a line colored by the same variable that goes through the mean of that category of data. We have also added a facet_grid() to convert this plot into a facet plot to better visualize the data through a categorical variable.

R




# load library tidyverse
library(tidyverse)
 
# create dataframe
df <- data.frame(
    group=factor(rep(c("category1", "category2","category3"),
                        each=100)),
    y=round(c(rnorm(100, mean=65, sd=5),
                  rnorm(100, mean=55, sd=5),
                 rnorm(100, mean=60, sd=5))),
    x=rnorm(300))
 
# create mean by group
mean <- df%>% group_by(group)%>%summarise(mean_val=mean(y))
 
# create ggplot scatter plot
# add horizontal line overlay at mean using geom_hline()
# divide plot in facet using function facet_grid()
ggplot(data = df, aes(x= x, y=y)) +
geom_point(aes(colour = group)) +
geom_hline(data= mean, aes(yintercept = mean_val,col=group))+
facet_grid(~group)

Output:


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!