Open In App
Related Articles

ML | Linear Discriminant Analysis

Improve Article
Save Article
Like Article
  1. Linear Discriminant Analysis (LDA) is a supervised learning algorithm used for classification tasks in machine learning. It is a technique used to find a linear combination of features that best separates the classes in a dataset.
  2. LDA works by projecting the data onto a lower-dimensional space that maximizes the separation between the classes. It does this by finding a set of linear discriminants that maximize the ratio of between-class variance to within-class variance. In other words, it finds the directions in the feature space that best separate the different classes of data.
  3. LDA assumes that the data has a Gaussian distribution and that the covariance matrices of the different classes are equal. It also assumes that the data is linearly separable, meaning that a linear decision boundary can accurately classify the different classes.

LDA has several advantages, including:

It is a simple and computationally efficient algorithm.
It can work well even when the number of features is much larger than the number of training samples.
It can handle multicollinearity (correlation between features) in the data.

However, LDA also has some limitations, including:

It assumes that the data has a Gaussian distribution, which may not always be the case.
It assumes that the covariance matrices of the different classes are equal, which may not be true in some datasets.
It assumes that the data is linearly separable, which may not be the case for some datasets.
It may not perform well in high-dimensional feature spaces.

Linear Discriminant Analysis or Normal Discriminant Analysis or Discriminant Function Analysis is a dimensionality reduction technique that is commonly used for supervised classification problems. It is used for modelling differences in groups i.e. separating two or more classes. It is used to project the features in higher dimension space into a lower dimension space. 
For example, we have two classes and we need to separate them efficiently. Classes can have multiple features. Using only a single feature to classify them may result in some overlapping as shown in the below figure. So, we will keep on increasing the number of features for proper classification. 

Suppose we have two sets of data points belonging to two different classes that we want to classify. As shown in the given 2D graph, when the data points are plotted on the 2D plane, there’s no straight line that can separate the two classes of the data points completely. Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D graph into a 1D graph in order to maximize the separability between the two classes. 

Here, Linear Discriminant Analysis uses both the axes (X and Y) to create a new axis and projects data onto a new axis in a way to maximize the separation of the two categories and hence, reducing the 2D graph into a 1D graph. 

Two criteria are used by LDA to create a new axis: 

  1. Maximize the distance between means of the two classes.
  2. Minimize the variation within each class.


In the above graph, it can be seen that a new axis (in red) is generated and plotted in the 2D graph such that it maximizes the distance between the means of the two classes and minimizes the variation within each class. In simple terms, this newly generated axis increases the separation between the data points of the two classes. After generating this new axis using the above-mentioned criteria, all the data points of the classes are plotted on this new axis and are shown in the figure given below. 

But Linear Discriminant Analysis fails when the mean of the distributions are shared, as it becomes impossible for LDA to find a new axis that makes both the classes linearly separable. In such cases, we use non-linear discriminant analysis.


Let’s suppose we have two classes and a d- dimensional samples such as x1, x2 … xn, where:

  • n1 samples coming from the class (c1) and n2 coming from the class (c2).

If xi is the data point, then its projection on the line represented by unit vector v can be written as vTxi

Let’s consider u1 and u2 be the means of samples class c1 and c2 respectively before projection and u1hat denotes the mean of the samples of class after projection and it can be calculated by:

\widetilde{\mu_1}  = \frac{1}{n_1}\sum_{x_i \in c_1}^{n_1} v^{T}x_i = v^{T} \mu_1


\widetilde{\mu_2} = v^{T} \mu_2

Now, In LDA we need to normalize |\widetilde{\mu_1} -\widetilde{\mu_2} |. Let y_i = v^{T}x_i  be the projected samples, then scatter for the samples of c1 is:

\widetilde{s_1^{2}} = \sum_{y_i \in c_1} (y_i - \mu_1)^2


\widetilde{s_2^{2}} = \sum_{y_i \in c_1} (y_i - \mu_2)^2

Now, we need to project our data on the line having direction v which maximizes

J(v) = \frac{\widetilde{\mu_1} - \widetilde{\mu_2}}{\widetilde{s_1^{2}} + \widetilde{s_2^{2}}}

For maximizing the above equation we need to find a projection vector that maximizes the difference of means of reduces the scatters of both classes. Now, scatter matrix of s1 and s2 of classes c1 and c2 are:

s_1 = \sum_{x_i \in c_1} (x_i - \mu_1)(x_i - \mu_1)^{T}

and s2

s_2 = \sum_{x_i \in c_2} (x_i - \mu_2)(x_i - \mu_2)^{T}

After simplifying the above equation, we get:

Now, we define, scatter within the classes(sw) and scatter b/w the classes(sb):

s_w = s_1 + s_2 \\ \\ s_b  = (\mu_1 - \mu_2) (\mu_1 - \mu_2 )^{T}

Now, we try to simplify the numerator part of J(v)

J(v) = \frac{|\widetilde{\mu_1} - \widetilde{\mu_2}|}{\widetilde{s_1^{2}} + \widetilde{s_2^{2}}} = \frac{v^{T}s_{b}v}{v^{T}s_{w}v}

Now, To maximize the above equation we need to calculate differentiation with respect to v

\frac{d J(v)}{dv} = s_b v  - \frac{v^{t}s_{b} v (s_w v)}{v^{T} s_w v} \\ \\ = s_b v - \lambda s_w v =0 \\ \\ s_b v = \lambda s_w v \\ \\ s_w^{-1} s_b v = \lambda v \\ \\ M v = \lambda v \\ \\ where, \\ \\ \lambda = \frac{v^{T}s_{b} v}{v^{T} s_w v} and \\ \\ M  = s_w^{-1} s_b

Here, for the maximum value of J(v) we will use the value corresponding to the highest eigenvalue. This will provide us the best solution for LDA.

Extensions to LDA: 

  1. Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of variance (or covariance when there are multiple input variables).
  2. Flexible Discriminant Analysis (FDA): Where non-linear combinations of inputs are used such as splines.
  3. Regularized Discriminant Analysis (RDA): Introduces regularization into the estimate of the variance (actually covariance), moderating the influence of different variables on LDA.


  • In this implementation, we will perform linear discriminant analysis using the Scikit-learn library on the Iris dataset.


# necessary import
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
# read dataset from URL
cls = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
dataset = pd.read_csv(url, names=cls)
# divide the dataset into class and target variable
X = dataset.iloc[:, 0:4].values
y = dataset.iloc[:, 4].values
# Preprocess the dataset and divide into train and test
sc = StandardScaler()
X = sc.fit_transform(X)
le = LabelEncoder()
y = le.fit_transform(y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# apply Linear Discriminant Analysis
lda = LinearDiscriminantAnalysis(n_components=2)
X_train = lda.fit_transform(X_train, y_train)
X_test = lda.transform(X_test)
# plot the scatterplot
# classify using random forest classifier
classifier = RandomForestClassifier(max_depth=2, random_state=0), y_train)
y_pred = classifier.predict(X_test)
# print the accuracy and confusion matrix
print('Accuracy : ' + str(accuracy_score(y_test, y_pred)))
conf_m = confusion_matrix(y_test, y_pred)

LDA 2 -variable plot

Accuracy : 0.9

[[10  0  0]
 [ 0  9  3]
 [ 0  0  8]]


  1. Face Recognition: In the field of Computer Vision, face recognition is a very popular application in which each face is represented by a very large number of pixel values. Linear discriminant analysis (LDA) is used here to reduce the number of features to a more manageable number before the process of classification. Each of the new dimensions generated is a linear combination of pixel values, which form a template. The linear combinations obtained using Fisher’s linear discriminant are called Fisher’s faces.
  2. Medical: In this field, Linear discriminant analysis (LDA) is used to classify the patient disease state as mild, moderate, or severe based upon the patient’s various parameters and the medical treatment he is going through. This helps the doctors to intensify or reduce the pace of their treatment.
  3. Customer Identification: Suppose we want to identify the type of customers who are most likely to buy a particular product in a shopping mall. By doing a simple question and answers survey, we can gather all the features of the customers. Here, a Linear discriminant analysis will help us to identify and select the features which can describe the characteristics of the group of customers that are most likely to buy that particular product in the shopping mall.

Last Updated : 13 Mar, 2023
Like Article
Save Article
Similar Reads
Related Tutorials