Open In App

How to use qqplot() instead of qqPlot() in car package?

Last Updated : 11 Jul, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will explain how to use the base R function ‘qqplot()’ instead of the ‘qqPlot()’ function from the `car` package to check the normality of a variable or a set of residuals. I will also show how to customize the plot and add confidence envelopes.

What is a QQ plot?

A QQ plot (quantile-quantile plot) is a graphical tool that compares the empirical quantiles of a variable (or residuals) with the theoretical quantiles of a reference distribution (usually normal). The QQ plot can help us assess if the variable (or residuals) follows a certain distribution, by checking if the points fall approximately on a straight line. If there are deviations from linearity, such as curvature or outliers, it indicates that the variable (or residuals) does not follow that distribution.

How to use qqplot()?

The base R function ‘qqplot(x, y)’ takes two vectors of numeric values as arguments and plots their sorted values against each other. This allows us to visually compare if ‘x’ and ‘y’ come from similar distributions. For example, we can compare two random samples from different distributions:

R




set.seed(123)
x <- rnorm(100)
y <- rt(100, df = 5)
qqplot(x, y)
abline(0, 1)


Output:

qqplot in R

qqplot in R

We can see that the points deviate from the reference line, especially at the tails, indicating that ‘x’ and ‘y’ have different distributions. However, if we want to check if a single variable (or residuals) follows a normal distribution, we need to compare it with a vector of quantiles from a normal distribution. This is what ‘qqPlot()’ from the ‘car` package does automatically. For example:

R




library(car)
x <- rnorm(100)
qqPlot(x)


Output:

qqPlot using car package in R

qqPlot using car package in R

We can see that the points are close to the reference line and within the shaded area, indicating that ‘x’ follows a normal distribution. To achieve the same result with ‘qqplot()’, we need to manually generate a vector of quantiles from a normal distribution using ‘qnorm()‘ and ‘ppoints()’. For example:

R




# sample from normal distribution
x <- rnorm(100)
 
# vector of quantiles from standard normal distribution
z <- qnorm(ppoints(x))
qqplot(z, x)
 
# add reference line
abline(0, 1)


Output:

How to use qqplot() instead of qqPlot() in car package?

 

We can see that we get essentially the same plot as with `qqPlot()`, except for some minor differences in labels and aesthetics.

How to customize qqplot()?

The advantage of using `qqplot()` over `qqPlot()` is that we have more control over how to customize our plot. We can use any argument that works with `par()` or `plot()`, such as changing colors (`col`), symbols (`pch`), sizes (`cex`), labels (`xlab`, `ylab`, etc.), limits (`xlim`, ylim`) and so on. For example:

R




x <- rnorm(100)
z <- qnorm(ppoints(x))
 
# customize plot
qqplot(z,x,
col = "blue",
pch = 16,
cex = 1.5,
xlab = "Normal Quantiles",
ylab = "Sample Quantiles",
main = "QQ Plot",
xlim = c(-3.5, 3.5),
ylim = c(-3.5, 3.5))
abline(0,1,
col = "red",
lwd = 2,
lty = 2)


Output:

qqplot with plot customization in R

qqplot with plot customization in R

We can see that we have changed several aspects of our plot according to our preferences.

 How to add confidence envelopes?

One way to enhance a QQ plot is to add confidence envelopes around the reference line, which indicate the expected variability of the empirical quantiles under the null hypothesis that the data or residuals come from the reference distribution. If some points fall outside the confidence envelopes, it suggests that they are unlikely to be generated by the reference distribution, and thus indicate departures from normality or other assumptions.

There are different methods to construct confidence envelopes for QQ plots, depending on how the variability of the empirical quantiles is estimated and how the confidence level is specified. One common method is based on pointwise standard errors of order statistics, which are derived from asymptotic theory and assume independent and identically distributed observations. Another method is based on simulation or bootstrap techniques, which can account for dependence and heteroscedasticity among observations.

In R Programming Language, one function that can produce QQ plots with confidence envelopes is `qqPlot()` from the `car` package. This function can plot empirical quantiles of a variable or studentized residuals from a linear model against theoretical quantiles of a comparison distribution (such as normal or t), and add pointwise confidence envelopes by default. The level of confidence can be specified by the `envelope` argument, which can take a logical value (`TRUE` or `FALSE`), a numeric value (such as 0.95), or a list with additional parameters (such as `level`, `simulate`, `reps`, etc.). For example:
 

R




# Load car package
library(car)
 
# Generate some data
set.seed(123)
x <- rnorm(100)
 
# Plot QQ plot with 95% confidence envelope (default)
qqPlot(x)


Output:

How to use qqplot() instead of qqPlot() in car package?

 

R




# Plot QQ plot with 90% confidence envelope
qqPlot(x, envelope = 0.9)


Output:

How to use qqplot() instead of qqPlot() in car package?

 

R




# Plot QQ plot with simulation-based confidence envelope
qqPlot(x, envelope = list(simulate = TRUE))


Output:

How to use qqplot() instead of qqPlot() in car package?

 



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads