Open In App

Advantages and Disadvantages of different Classification Models

Classification is a typical supervised learning task. We use it in those cases where we have to predict a categorical type, that is if a particular example belongs to a category or not (unlike regression, which is used to predict continuous values). For example, sentiment analysis, classify an email as spam or not, predicting if a person buys an SUV or not provided a training set containing salary, and buying an SUV.

Types of Classification Models:



Decision Tree

Splitting the dataset using Decision Tree



Classification Model

Advantages

Disadvantages

Logistic Regression

Probabilistic Approach, gives information about statistical significance of features. The assumptions of logistic regression.

K – Nearest Neighbours

Simple to understand, fast and efficient. Need to manually choose the number of neighbours ‘k’.

Support Vector Machine (SVM)

Performant, not biased by outliers, not sensitive to overfitting. Not appropriate for non-linear problems, not the best choice for large number of features. 

Kernel SVM

High performance on non – linear problems, not biased by outliers, not sensitive to overfitting. Not the best choice for large number of features, more complex. 

Naive Bayes

Efficient, not biased by outliers, works on non – linear problems, probabilistic approach. Based in the assumption that the features have same statistical relevance.

Decision Tree Classification

Interpretability, no need for feature scaling, works on both linear / non – linear problems. Poor results on very small datasets, overfitting can easily occur.

Random Forest Classification

Powerful and accurate, good performance on many problems, including non – linear. No interpretability, overfitting can easily occur, need to choose the number of trees manually.

How do we choose the right Classification Model for a given problem?

The accuracy of classification models is measured in terms of the number of false positives and negatives. 


False positives and False negatives


In the above figure, for 1, 4 – y =  y̅ ( actual value = predicted value). The error at 3 is False positive or type-1 error (we predicted a positive outcome, but it was false – we predicted an effect that did not occur). The error at 2 is False negative or type-2 error (we predicted an outcome false, which in reality happens – This is something like predicting that a cancer patient does not have cancer, which is very dangerous for the patient’s health. We use a Confusion Matrix to represent the number of false positives, false negatives, and correctly predicted outcomes. 


Calculating Accuracy from Confusion Matrix

Suppose that initially, the model correctly predicts 9700 observations as true, 100 observations as false, 150 are type-1 errors (False positives) and the rest 50 are type-2 errors (False negatives). Hence, the accuracy rate = (9800/10000)*100 = 98%.

Now, let us stop the model from making predictions and say that our prediction y̅ = 0 always. In this case, the number of false positives reduces to 0 and adds to correctly predicted true observations, whereas previously correctly predicted false observations reduces to 0. It adds to the false negatives. Therefore, now we have – 9850 observations are correctly predicted as true, 150 observations are false negatives. Hence, the accuracy rate = (9850/10000)*100 = 98.5%, which is more than the previous model! But actually, our model is not trained at all. It is predicting 0 always. This is known as Accuracy Paradox.

Accuracy Paradox

Therefore, we need more accurate methods than the accuracy rate to analyse our model. We use the CAP curve for this purpose. The Accuracy ratio for the model is calculated using the CAP Curve Analysis. The accuracy ratio is given as the ratio of the area enclosed between the model CAP and the random CAP (aR) to the area enclosed between the Perfect CAP and  the random CAP (aP). The closer the accuracy ratio is to 1, the better the model is. A good model has its CAP curve between the perfect CAP and the random CAP.


CAP Curve Analysis

By considering the type of relation between the dependent and independent variable (linear or non-linear), the pros and cons of choosing a particular classification model for the problem, and the accuracy of the model through the methods mentioned above, we choose the classification problem that is the most suitable to the problem to be solved.


Article Tags :