Open In App

Generalized Additive Models Using R

A versatile and effective statistical modeling method called a generalized additive model (GAM) expands the scope of linear regression to include non-linear interactions between variables. Generalized additive models (GAMs) are very helpful when analyzing complicated data that displays non-linear patterns, such as time series, and spatial data, or when the connections between predictors and the response variable are difficult to describe by straightforward linear functions. We’ll look at the basics of GAMs in this guide and show you how to use them in the R Programming Language.

Generalized Additive Models (GAMs)

Traditional linear regression models assume a linear relationship between predictors and the response variable. However, many real-world phenomena exhibit non-linear, complex relationships. GAMs address this limitation by allowing for flexible modeling of these relationships through the use of smoothing functions. This makes GAMs a valuable tool for capturing patterns in data that linear models might miss.



A generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

Basic Components of a GAM

Understanding GAMs

Its been known that any multivariate function could be represented as sums and compositions of univariate functions.



But they require highly complicated functions and thus are not suitable for modelling approaches. Therefore, GAMs dropped the outer sum and made sure the function belongs to simpler class.

where ???? is a smooth monotonic function. Writing g for the inverse of ????, this is traditionally written as

When this function is approximating the expectation of some observed quantity, it could be written as

This is the standard formulation of a GAM.

Generalized Additive Model on mtcars dataset

Pre-Requisites

To work with GAMs in R, you’ll need to install and load the mgcv package, which is a widely-used package for fitting GAMs along with ggplot2- used for data visualisation. You can install them using the following command:

install.packages('mgcv')
install.packages('ggplot2')

Loading Packages

# Load required packages
library(mgcv)
library(ggplot2)

                    

Load the dataset

# Load the mtcars dataset
data(mtcars)
head(mtcars)

                    

Output:

                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Building Model

# Fit a GAM to model the relationship between mpg and hp
gam_model <- gam(mpg ~ s(hp), data = mtcars)
 
# Summary of the GAM
summary(gam_model)

                    

Output:

Family: gaussian 
Link function: identity
Formula:
mpg ~ s(hp)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.0906 0.5487 36.62 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(hp) 2.618 3.263 26.26 2.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.735 Deviance explained = 75.7%
GCV = 10.862 Scale est. = 9.6335 n = 32

First we install and load the necessary R packages mgcv and ggplot2. The mgcv package is used for fitting Generalized Additive Models (GAMs), and ggplot2 is used for data visualization. We also load the built-in mtcars dataset, which contains information about various car models, including their miles per gallon (mpg) and horsepower (hp).

Visualize the results

# Generate the random hp dataset for predictions
new_data <- data.frame(hp = seq(min(mtcars$hp), max(mtcars$hp),
                                length.out = 100))
predictions <- predict(gam_model, newdata = new_data, type = "response",
                       se.fit = TRUE)
 
# Plot the data and the GAM fit
ggplot() +
  geom_point(data = mtcars, aes(x = hp, y = mpg)) +
  geom_line(data = data.frame(hp = new_data$hp, mpg = predictions$fit),
            aes(x = hp, y = mpg), color = "blue", size = 1) +
  geom_ribbon(data = data.frame(hp = new_data$hp, fit = predictions$fit,
                                se = predictions$se.fit), aes(x = hp,
                                ymin = fit - 1.96 * se,
                                ymax = fit + 1.96 * se), alpha = 0.3) +
  labs(title = "Generalized Additive Model (GAM) Fit for mpg vs. hp",
       x = "Horsepower", y = "Miles per Gallon") +
  theme_minimal()

                    

Output:

Generalized Additive Models Using R


First a new data frame new_data is created. It includes a sequence of values for horsepower (hp) spanning the range of the hp values in the mtcars dataset. This new data is used to make predictions using the fitted GAM model. The predict function is used to obtain these predictions. The type = “response” argument ensures we get the predicted values on the original scale (miles per gallon) rather than on the link scale. The se.fit = TRUE argument also calculates standard errors for the predictions.

Finally, we create a visualization of the fitted GAM model and the original data. The plot(gam_model) function generates a plot that shows the smooth curve representing the relationship between age and tree height, as well as the individual data points.

Conclusion

In conclusion, Generalized Additive Models (GAMs) offer a flexible and powerful approach to modeling complex relationships in data. This guide provides an overview of GAMs, their implementation in R, interpretation, model evaluation, and advanced topics. To deepen your understanding and expertise in GAMs, consider further reading and hands-on practice with real-world datasets.


Article Tags :