Open In App

Normal and Shrinkage Linear Discriminant Analysis for Classification in Scikit Learn

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will try to understand the difference between Normal and Shrinkage Linear Discriminant Analysis for Classification. We will try to implement the same using sci-kit learn library in Python. But first, let’s try to understand what is LDA.

What is Linear discriminant analysis (LDA)?

Linear discriminant analysis (LDA) is a supervised learning algorithm that projects the data onto a lower-dimensional space and separates the classes using a linear decision boundary. LDA is commonly used for classification tasks, where the goal is to predict the class label of a sample based on its features.

In LDA, the projection onto a lower-dimensional space is performed by finding a set of directions in the original feature space that maximizes the separation between the classes. These directions, known as the discriminant directions, are calculated using the class means and covariances.

Once the discriminant directions are found, the data is projected onto the space spanned by these directions and a linear decision boundary is constructed to separate the classes. The decision boundary is a hyperplane that is orthogonal to the discriminant directions and maximally separates the classes.

The LinearDiscriminantAnalysis class has two main modes of operation: normal and shrinkage. In normal mode, LDA assumes that the class covariance matrices are equal and estimated using the sample covariance of the entire dataset. In shrinkage mode, LDA uses a shrinkage estimator to regularize the covariance matrix and improve the stability of the model.

Performing linear discriminant analysis (LDA) for classification in scikit-learn involves the following steps:

  1. Import the LinearDiscriminantAnalysis class from sklearn.discriminant_analysis module.
  2. Generate or load the data for the classification task. The data should be a 2D array of feature values and a 1D array of class labels.
  3. Split the data into training and test sets using the train_test_split() function from the sklearn.model_selection module.
  4. Create an instance of the LinearDiscriminantAnalysis class and specify any desired hyperparameters, such as the solver (the solver parameter) and the shrinkage value (the shrinkage parameter).
  5. Fit the LinearDiscriminantAnalysis estimator to the training data using the fit() method.
  6. Use the estimator to make predictions on the test set using the predict() method.
  7. Evaluate the performance of the model by calculating metrics such as classification accuracy or the confusion matrix.

Here is a complete code of how to use the LinearDiscriminantAnalysis class to perform LDA for classification in scikit-learn:

Python3




import numpy as np
from sklearn.discriminant_analysis
import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
  
# Generate random data
X = np.random.randn(100, 10)
y = np.random.randint(2, size=100)
  
# Split the data into training and test sets
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.3)
  
# Create a LinearDiscriminantAnalysis estimator
# and fit it to the training data
estimator = LinearDiscriminantAnalysis(shrinkage=None)
estimator.fit(X_train, y_train)
  
# Obtain predictions for the test set
y_pred = estimator.predict(X_test)
  
# Print the classification accuracy
print(estimator.score(X_test, y_test))


Output:

0.5

This code will fit a LinearDiscriminantAnalysis estimator to the training data using normal mode and use it to make predictions on the test set. The classification accuracy is then printed using the score() method of the estimator.

Shrinkage Linear Discriminant Analysis:

Shrinkage linear discriminant analysis (LDA) is a variant of LDA that uses a shrinkage estimator to regularize the covariance matrices of the classes. In normal LDA, the covariance matrices are estimated using the sample covariance of the entire dataset, which can be unstable and lead to overfitting. Shrinkage LDA addresses this issue by using a shrinkage estimator, such as the Ledoit-Wolf estimator, to regularize the covariance matrices and improve their stability.

To use shrinkage mode, you can set the solver parameter of the LinearDiscriminantAnalysis estimator to the solver of your choice(Eigen or lsqr) and specify a value for the shrinkage parameter. For example:

# Create a LinearDiscriminantAnalysis estimator
# with shrinkage and fit it to the training data
estimator = LinearDiscriminantAnalysis(solver='svd', shrinkage=0.5)
estimator.fit(X_train, y_train)

This code will create a LinearDiscriminantAnalysis estimator that uses shrinkage mode with a shrinkage value and fit it to the training

Python3




import numpy as np
from sklearn.discriminant_analysis
import LinearDiscriminantAnalysis
from sklearn.model_selection import train_test_split
  
# Generate random data
X = np.random.randn(100, 10)
y = np.random.randint(2, size=100)
  
# Split the data into training and test sets
X_train, X_test,\
    y_train, y_test = train_test_split(X, y,
                                       test_size=0.3)
  
# Create a LinearDiscriminantAnalysis estimator
# with shrinkage and fit it to the training data
estimator = LinearDiscriminantAnalysis(solver='eigen',
                                       shrinkage='auto')
estimator.fit(X_train, y_train)
  
# Obtain predictions for the test set
y_pred = estimator.predict(X_test)
  
# Print the classification accuracy
print(estimator.score(X_test, y_test))


Output:

0.43333333333333335


Last Updated : 02 Jan, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads