Regularized Discriminant Analysis
Regularized Discriminant analysis
Linear Discriminant analysis and QDA work straightforwardly for cases where a number of observations is far greater than the number of predictors n>p. In these situations, it offers very advantages such as ease to apply (Since we don’t have to calculate the covariance for each class) and robustness to the deviations of model assumptions.
However, the use of LDA becomes a serious challenge when used in for example the number of observations is less than predictors such as micro array settings because there are two challenges here
- The sample covariance matrix is singular and cannot be inverted.
- The high dimensionality makes the direct matrix operation formidable, hence hindering the applicability of this method.
Therefore, we will make some changes in the LDA and QDA, i.e we form a new covariance matrix that combines the covariance matrix of LDA () and QDA () using a tuning parameter
However, some version of regularized discriminant analysis uses another parameter () with the following equation:
RDA limits the separate covariance of QDA towards the common covariance of LDA. This improves the estimates the covariance matrix in situations where the number of predictors is larger than the number of samples in the training data leading to improvement in the model accuracy.
In the above equation, the equation \gamma and \lambda both have values b/w 0 and 1. Now, for all the four boundary values, it produces a special equation case for each one. Let’s look at these special cases:
- the covariance of QDA i.e the individual covariance of each group.
- the covariance of LDA, i.e a common covariance matrix.
- conditional independent variance.
- Classification using Euclidean distance similar to the previous case, but variances are the same for all groups.
Implementation
- In this implementation, we will perform Regularized discriminant Analysis. We will use the klaR library and the rda function in it. We also use the iris dataset.
R
library (tidyverse)
library (MASS)
library (klaR)
data ( 'iris' )
train_test.samples <- iris$Species %>% createDataPartition (p = 0.8, list = FALSE )
train.data <- iris[train_test.samples, ]
test.data <- iris[-train_test.samples, ]
preproc.param <- train.data %>%
preProcess (method = c ( "center" , "scale" ))
train.transformed <- preproc.param %>% predict (train.data)
test.transformed <- preproc.param %>% predict (test.data)
model = rda (Species ~. , data= train.transformed)
model
predictions <- model %>% predict (test.transformed)
mean (predictions$class==test.transformed$Species)
|
Output:
Call:
rda(formula = Species ~ ., data = train.transformed)
Regularization parameters:
gamma lambda
0.002619109 0.222244278
Prior probabilities of groups:
setosa versicolor virginica
0.3333333 0.3333333 0.3333333
Misclassification rate:
apparent: 1.667 %
cross-validated: 1.667 %
### accuracy
0.9666667
References:
Last Updated :
03 Aug, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...