Open In App

Joining Points on Scatter plot using Smooth Lines in R

Improve
Improve
Like Article
Like
Save
Share
Report

A smooth line, also known as a smoothed line, is a line that is drawn through a set of data points in such a way that it represents the overall trend of the data while minimizing the effects of random fluctuations or noise. In other words, it is a way to represent a general pattern or trend in a dataset while reducing the impact of individual data points that deviate from that pattern.

There are several methods that can be used to create a smooth line, such as linear regression, loess, and splines. Each method has its own pros and cons, and the choice of method will depend on the specific characteristics of the data and the goals of the analysis.

When we plot a smooth line on a scatter plot, it helps us to identify the underlying pattern in the data and to make predictions about future values based on that pattern. It also helps to identify outliers in the data, if any and gives a general idea about the spread of data.

It is also useful for exploring relationships between two or more variables, especially when the data points are dense or overlapping.

How to install GGplot2 Library?

You can install the ggplot2 library by running the following command in the R console:

install.packages("ggplot2")

Then, you can load the library by running the following command:

library(ggplot2)

Make sure that you are connected to the internet while installing the package. Once the package is installed and loaded successfully you can proceed with the code and you should not encounter the error message.

The geom_smooth() function is used to plot a smooth line using ggplot2 in R Programming Language. This function is a geom, which is a kind of plotting layer in ggplot2, and it can be added to a plot using the + operator.

Syntax

geom_smooth(mapping = NULL, data = NULL, stat = “smooth”, position = “identity”, 
           …, method = “auto”, formula = y ~ x, se = TRUE, n = 80, 
           fullrange = FALSE, level = 0.95, span = NULL, method.args = list(), 
           method.fit = NULL, show.legend = NA, inherit.aes = TRUE)

Parameters

  • mapping: Aesthetic mapping, usually constructed with aes().
  • data: Data frame containing the data to be plotted.
  • stat: The statistical transformation to use on the data for this layer.
  • position: The position adjustment to use for overlapping points on this layer.
  • … : Additional arguments passed to the underlying smoothing method.
  • method: The smoothing method to use. The default is “auto”, which will use “loess” for small datasets and “gam” for larger ones.
  • formula: A formula used to specify the relationship between x and y.
  • se: Whether to show a standard error of the smoothing estimate.
  • n: Number of observations used to compute the smooth.
  • fullrange: If true, the smooth is computed over the full range of x.
  • level: Level of confidence interval to use.
  • span: The span of the smoother, similar to the “window” parameter in the loess function.
  • method.args: Additional arguments passed to the underlying smoothing method.
  • method.fit: Function used to fit the smoother.
  • show.legend: Whether to show a legend for this layer.
  • inherit.aes: If true, the aesthetic properties of the layer are inherited from the plot defaults.
     

You can set these arguments to customize the appearance and behavior of the smooth line. Most importantly, you can use method, se and span arguments to control the smoothing method, standard error and span of the smoother.

Creating a Simple Smooth Line 

R




library(ggplot2)
  
# Create some example data
x <- 1:100
y <- sin(x)
df <- data.frame(x, y)
  
# Create the plot
ggplot(df, aes(x, y)) + 
  # Add points to the plot
  geom_point() +  
  geom_smooth(method = "loess", se=F,
              size=1.2, color="red",
              linetype = "dashed")+
  ggtitle("Smooth Line Plot") + 
  xlab("X-axis") + 
  ylab("Y-axis")


Output:

 

This code creates a plot of a sine wave with a smooth line using the “loess” method, a dashed red line with a width of 1.2, and without showing the standard error. It also adds axis labels and titles to the plot.

You can also use your own data in place of the example data, and you can adjust the line type, color, and other properties to customize the plot as per your requirements.

Different Methods of plotting

The geom_smooth() function in ggplot2 provides several methods for plotting a smooth line through a set of data points. These methods include:

  • “loess”: locally weighted regression. It is a non-parametric method that fits a polynomial regression model to a subset of the data and uses a weighting function to assign greater importance to nearby data points. It’s useful when data is non-linear and not too large.
  • “lm”: linear regression. It fits a linear model to the data and can be useful when the relationship between x and y is roughly linear.
  • “glm”: generalized linear regression. It is an extension of linear regression that allows for the response variable to have a non-normal distribution and for the relationship between the predictor and response variables to be non-linear.
  • “gam”: generalized additive models. It is a flexible framework for fitting non-linear relationships between predictor and response variables.
  • “rlm”: robust linear models, it is an extension of linear models that are resistant to outliers.
  • “auto”: It will automatically select “loess” for small datasets and “gam” for larger datasets.
  • “rq”: quantile regression, it is an extension of linear models that allows for the estimation of quantiles of the conditional distribution of the response variable.

 You can specify the method to use by providing the appropriate argument to the geom_smooth() function. For example, to use the “loess” method:

R




library(ggplot2)
  
# Create some example data
x <- 1:100
y <- sin(x)
df <- data.frame(x, y)
  
# Create the plot
ggplot(df, aes(x, y)) + 
  # Add points to the plot
  geom_point() +  
  geom_smooth(method = "loess", se=F,
              size=1.2, color="red",
              linetype = "dashed")+
  ggtitle("Smooth Line Plot") + 
  xlab("X-axis") + 
  ylab("Y-axis")


and to use the “lm” method:

R




library(ggplot2)
  
# Create some example data
x <- 1:100
y <- sin(x)
df <- data.frame(x, y)
  
# Create the plot
ggplot(df, aes(x, y)) + 
  # Add points to the plot
  geom_point() +  
  geom_smooth(method = "lm", se=F,
              size=1.2, color="red",
              linetype = "dashed")+
  ggtitle("Smooth Line Plot") + 
  xlab("X-axis") + 
  ylab("Y-axis")


Output Differences in both the methods:

Comparison between the lines drawn by using "loess" and "lm" method

Comparison between the lines drawn by using “loess” and “lm” method

You should choose the method that best represents the underlying pattern of your data and that is consistent with the goals of your analysis. Similarly, you can try all the available methods mentioned above. 

Different types of Line available

The geom_smooth() function in ggplot2 allows you to change the line type of the smooth line by using the line type argument. The possible values for the line-type argument include:

“solid” (default): a solid line.
“dashed”: a line composed of dashes.
“dotted”: a line composed of dots.
“dotdash”: a line composed of alternating dots and dashes.
“longdash”: a line composed of long dashes.
“twodash”: a line composed of two dashes.

You can always change the linetype if you want to change the line. If you change these lines in the above code, you can change the linetype.

For Example: use a “dashed” linetype.

R




library(ggplot2)
  
# Create some example data
x <- 1:100
y <- sin(x)
df <- data.frame(x, y)
  
# Create the plot
ggplot(df, aes(x, y)) + 
  # Add points to the plot
  geom_point() +  
  geom_smooth(method = "loess", se=F,
              size=1.2, color="red",
              linetype = "solid")+
  ggtitle("Smooth Line Plot") + 
  xlab("X-axis") + 
  ylab("Y-axis")


Output Differences in both Line Types:

Comparison between the lines drawn by using "dashed" and "solid" linetype

Comparison between the lines drawn by using “dashed” and “solid” linetype

Example 1:

R




library(ggplot2)
  
# Create some example data
x <- rnorm(100)
y <- x + rnorm(100)
df <- data.frame(x, y)
  
# Create the plot using geom_smooth
ggplot(df, aes(x, y)) + 
  # Add points to the plot
  geom_point() +  
  geom_smooth(method = "loess", se=F,
              size=1.2, color="red",
              linetype = "dashed")+
  ggtitle("Smooth Line Plot") + 
  xlab("X-axis") + 
  ylab("Y-axis")


Output:

 

This code creates a scatter plot of the data and adds a smooth line to the plot, using the “loess” method, a red color and width of 1.2, and with a dashed line type. It also adds axis labels and titles to the plot.

Example 2:

R




library(ggplot2)
  
# Create some example data
x <- rnorm(100)
y <- x + rnorm(100)
df <- data.frame(x, y)
  
# Create the plot using geom_smooth
ggplot(df, aes(x, y)) + 
  geom_point() +  # Add points to the plot
  geom_smooth(method = "gam", se=F,
              size=1.2, color="purple",
              linetype = "dotted")+
  ggtitle("Smooth Line Plot") + 
  xlab("X-axis") + 
  ylab("Y-axis")


Output:

 

This code creates a scatter plot of the data and adds a smooth line to the plot, using the “gam” method, a purple color, a width of 1.2, and a dotted line type. It also adds axis labels and titles to the plot.



Last Updated : 05 Feb, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads