Open In App

Regularized Discriminant Analysis

Regularized Discriminant analysis

Linear Discriminant analysis and QDA work straightforwardly for cases where a number of observations is far greater than the number of predictors n>p. In these situations, it offers very advantages such as ease to apply (Since we don’t have to calculate the covariance for each class) and robustness to the deviations of model assumptions.

However, the use of LDA becomes a serious challenge when used in for example the number of observations is less than predictors such as micro array settings because there are two challenges here



Therefore, we will make some changes in the LDA and QDA, i.e we form a new covariance matrix that combines the covariance matrix of LDA () and QDA () using a tuning parameter 



However, some version of regularized discriminant analysis uses another parameter () with the following equation:

RDA limits the separate covariance of QDA towards the common covariance of LDA. This improves the estimates the covariance matrix in situations where the number of predictors is larger than the number of samples in the training data leading to improvement in the model accuracy. 

In the above equation, the equation \gamma and \lambda both have values b/w 0 and 1. Now, for all the four boundary values, it produces a special equation case for each one. Let’s look at these special cases:

Implementation

# imports
library(tidyverse)
library(MASS)
library(klaR)
 
data('iris')
# model
# divide the data into train and test
train_test.samples <- iris$Species %>% createDataPartition(p = 0.8, list = FALSE)
train.data <- iris[train_test.samples, ]
test.data <- iris[-train_test.samples, ]
 
# Data preprocessing
# Normalize the different parameters of dataset and categorical
# variables also includes
preproc.param <- train.data %>%
  preProcess(method = c("center", "scale"))
 
# Transform the data using the estimated parameters
train.transformed <- preproc.param %>% predict(train.data)
test.transformed <- preproc.param %>% predict(test.data)
 
# define rda models
model = rda(Species ~. , data= train.transformed)
model
 
# run the model on test data and generate the prediction
predictions <- model %>% predict(test.transformed)
# calculate model accuracy
mean(predictions$class==test.transformed$Species)

                    

 
 

Output:

Call: 
rda(formula = Species ~ ., data = train.transformed)

Regularization parameters: 
      gamma      lambda 
0.002619109 0.222244278 

Prior probabilities of groups: 
    setosa versicolor  virginica 
 0.3333333  0.3333333  0.3333333 

Misclassification rate: 
       apparent: 1.667 %
cross-validated: 1.667 %

### accuracy
0.9666667

References:


 


Article Tags :