Skip to content
Related Articles

Related Articles

Save Article
Improve Article
Save Article
Like Article

Generalized Linear Models

  • Difficulty Level : Expert
  • Last Updated : 29 Sep, 2021

Prerequisite:

The following article discusses the Generalised linear models (GLMs) which explains how Linear regression and Logistic regression are a member of a much broader class of models. GLMs can be used to construct the models for regression and classification problems by using the type of distribution which best describes the data or labels given for training the model. Below given are some types of datasets and the corresponding distributions which would help us in constructing the model for a particular type of data (The term data specified here refers to the output data or the labels of the dataset).

Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.



  1. Binary classification data – Bernoulli distribution
  2. Real valued data – Gaussian distribution
  3. Count-data – Poissons distribution

To understand GLMs we will begin by defining exponential families. Exponential families are a class of distributions whose probability density function(PDF) can be molded into the following form:

P(y;\eta) = b(y)exp(\eta^T * T(y) - a(\eta))\hspace{1mm}\\



\small \eta - Natural\hspace{1mm} parameter\hspace{1mm} (can\hspace{1mm} be\hspace{1mm} a \hspace{1mm}scalar \hspace{1mm}or\hspace{1mm} a \hspace{1mm}vector\hspace{1mm} quantity)\\ y - Label\hspace{1mm} for\hspace{1mm} data\\ T(y) - Sufficient\hspace{1mm} statistic\hspace{1mm} (Here, \hspace{1mm} it \hspace{1mm}will\hspace{1mm} be \hspace{1mm}equal \hspace{1mm}to \hspace{1mm}y)\\ a(\eta) - Log-partition \hspace{1mm}function (It \hspace{1mm}should \hspace{1mm}be \hspace{1mm}purely \hspace{1mm}a \hspace{1mm}function \hspace{1mm}of \hspace{1mm}eta)\\ b(y) - It \hspace{1mm}should\hspace{1mm} be \hspace{1mm}purely \hspace{1mm}a \hspace{1mm}function \hspace{1mm}of \hspace{1mm}y\\
ProofBernoulli distribution is a member of the exponential family.

P(y;\phi) = \phi^y * (1-\phi)^{(1-y)}\\           \hspace{1cm}= exp(log(\phi^y * (1-\phi)^{(1-y)}))\\           \hspace{1cm}= exp(y * log(\phi) + (1-y) * log(1-\phi))\\           \hspace{1cm}= exp(y * log(\phi/1-\phi)) + log(1-\phi))\hspace{1mm}- Eq 2

Therefore, on comparing Eq1 and Eq2 :
\eta = log(\phi/1-\phi)\\
\phi = 1/1+e^{-\eta} - Eq 3\\
b(y) = 1\\ T(y) = y\\ a(\eta) =  -log(1-\phi)

Note: As mentioned above the value of phi (which is the same as the activation or sigmoid function for Logistic regression) is not a coincidence. And it will be proved later in the article how Logistic regression model can be derived from the Bernoulli distribution.

Proof Gaussian distribution is a member of the exponential family.

P(y, mu) = 1/\sqrt{2\pi} * exp(-1/2*(y-\mu)^2)\\ \hspace{1.5cm} = 1/\sqrt{2\pi}*exp(-1/2*y^{2}) * exp(\mu*y-1/2*\mu^{2}) - Eq3\\

Therefore, on comparing Eq1 and Eq3:

b(y) =  1/\sqrt{2\pi}*exp(-1/2*y^{2})\\ \eta = \mu\\ T(y) = y\\ a(\eta) = 1/2*\eta^2\\

Constructing GLMs:
To construct GLMs for a particular type of data or more generally for linear or logistic classification problems the following three assumptions or the design choices are to be considered:

y|x;\theta \sim exponential\hspace{1mm} family(\eta)\\  Given \hspace{1mm}x\hspace{1mm} our\hspace{1mm} goal\hspace{1mm} is\hspace{1mm} to\hspace{1mm} predict \hspace{1mm}T(y)\hspace{1mm} which \hspace{1mm}is \hspace{1mm}equal \hspace{1mm}to\hspace{1mm} y \hspace{1mm}in\hspace{1mm} our\hspace{1mm} case \hspace{1mm}or \hspace{1mm}h(x) = E[y|x] = \mu\\ \eta = \theta^T  * x

The first assumption is that if x is the input data parameterized by theta the resulting output or y will be a member of the exponential family. This means if we are provided with some labeled data our goal is to find the right parameters theta which fits the given model as closely as possible. The third assumption is the least justified and can be considered as a design choice.

Linear Regression Model:
To show that Linear Regression is a special case of the GLMs. It is considered that the output labels are continuous values and are therefore a Gaussian distribution. So, we have

y|x;\theta \sim \mathcal{N}(\mu, \sigma^2) \\ h_\theta(x) = E[y|x;\theta]\\ \hspace{0.9cm} = \mu\\ \hspace{0.9cm} = \eta\\ \hspace{0.9cm} = \theta^Tx

The first equation above corresponds to the first assumption that the output labels (or target variables) should be the member of an exponential family, Second equation corresponds to the assumption that the hypothesis is equal the expected value or mean of the distribution and lastly, the third equation corresponds to the assumption that natural parameter and the input parameters follow a linear relationship.

Logistic Regression Model:
To show that Logistic Regression is a special case of the GLMs. It is considered that the output labels are Binary valued and are therefore a Bernoulli distribution. So, we have

y|x;\theta \sim Bernoulli(\phi) \\ h_\theta(x) = E[y|x;\theta]\\ \hspace{0.9cm} = \mu\\ \hspace{0.9cm} = 1/1+e^{-\eta}\\

From the third assumption, it is proven that:
\eta = \theta^Tx\\ h_\theta(x) = 1/1+e^{-\theta^Tx\\}

The function that maps the natural parameter to the canonical parameter is known as the canonical response function (here, the log-partition function) and the inverse of it is known as the canonical link function.
Therefore by using the three assumptions mentioned before it can be proved that the Logistic and Linear Regression belongs to a much larger family of models known as GLMs.



Reference: http://cs229.stanford.edu/notes/cs229-notes1.pdf




My Personal Notes arrow_drop_up
Recommended Articles
Page :