Differentiate between Support Vector Machine and Logistic Regression

Logistic Regression:
It is a classification model which is used to predict the odds in favour of a particular event. The odds ratio represents the positive event which we want to predict, for example, how likely a sample has breast cancer/ how likely is it for an individual to become diabetic in future. It used the sigmoid function to convert an input value between 0 and 1. The basic idea of logistic regression is to adapt linear regression so that it estimates the probability a new entry falls in a class. The linear decision boundary is simply a consequence of the structure of the regression function and the use of a threshold in the function to classify. Logistic Regression tries to maximize the conditional likelihood of the training data, it is highly prone to outliers. Standardization (as co-linearity checks) is also fundamental to make sure a features’ weights do not dominate over the others.

Support Vector Machine (SVM):
It is a very powerful classification algorithm to maximize the margin among class variables. This margin (support vector) represents the distance between the separating hyperplanes (decision boundary). The reason to have decision boundaries with large margins is to separate positive and negative hyperplanes with adjustable bias-variance proportion. The goal is to separate so that negative samples would fall under negative hyperplane and positive samples would fall under positive hyperplane. SVM is not as prone to outliers as it only cares about the points closest to the decision boundary. It changes its decision boundary depending on the placement of the new positive or negative events. The decision boundary is much more important for Linear SVMs – the whole goal is to place a linear boundary in a smart way. There isn’t a probabilistic interpretation of individual classifications, at least not in the original formulation.

Hence, key points are:

• SVM try to maximize the margin between the closest support vectors whereas logistic regression maximize the posterior class probability
• SVM is deterministic (but we can use Platts model for probability score) while LR is probabilistic.
• For the kernel space, SVM is faster

Best uses of Logistic Regression vs. Support Vector Machine:

You can use either logistic regression or support vector machines, depending on how many training data(datasets) and attributes you have.

Let’s consider this example:
n = no. of attributes
m = no. of training data

• When n is high (between 1-10,000) and m is modest (between 10-1000), apply Logistic Regression or Support Vector Machine (SVM) with a linear kernel.
• When n is modest (between 1-10,00) and m is intermediate (between 10-10,000), apply Support Vector Machine (SVM) with (Gaussian, polynomial, etc) kernel.
• When n is modest (between 1-10,00) and m is high (between 50,000-1,000,000+), apply Logistic Regression or Support Vector Machine (SVM) with a linear kernel after manually adding additional attributes.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Previous
Next