In order to train a Linear Regression model, we have to learn some model parameters such as feature weights and bias terms. An approach to do the same is Gradient Descent which is an iterative optimization algorithm capable of tweaking the model parameters by minimizing the cost function over the train data. It is a complete algorithm i.e it is guaranteed to find the global minimum (optimal solution) given there is enough time and the learning rate is not very high. Two Important variants of Gradient Descent which are widely used in Linear Regression as well as Neural networks are Batch Gradient Descent and Stochastic Gradient Descent(SGD).
Batch Gradient Descent: Batch Gradient Descent involves calculations over the full training set at each step as a result of which it is very slow on very large training data. Thus, it becomes very computationally expensive to do Batch GD. However, this is great for convex or relatively smooth error manifolds. Also, Batch GD scales well with the number of features.
Stochastic Gradient Descent: SGD tries to solve the main problem in Batch Gradient descent which is the usage of whole training data to calculate gradients as each step. SGD is stochastic in nature i.e it picks up a “random” instance of training data at each step and then computes the gradient making it much faster as there is much fewer data to manipulate at a single time, unlike Batch GD.
There is a downside of the Stochastic nature of SGD i.e once it reaches close to the minimum value then it doesn’t settle down, instead bounces around which gives us a good value for model parameters but not optimal which can ve solved by reducing the learning rate at each step which can reduce the bouncing and SGD might settle down at global minimum after some time.
Difference between Batch Gradient Descent and Stochastic Gradient Descent
|S.NO.||Batch Gradient Descent||Stochastic Gradient Descent|
|1.||Computes gradient using the whole Training sample||Computes gradient using a single Training sample|
|2.||Slow and computationally expensive algorithm||Faster and less computationally expensive than Batch GD|
|3.||Not suggested for huge training samples.||Can be used for large training samples.|
|4.||Deterministic in nature.||Stochastic in nature.|
|5.||Gives optimal solution given sufficient time to converge.||Gives good solution but not optimal.|
|6.||No random shuffling of points are required.||The data sample should be in a random order, and this is why we want to shuffle the training set for every epoch.|
|7.||Can’t escape shallow local minima easily.||SGD can escape shallow local minima more easily.|
|8.||Convergence is slow.||Reaches rthe convergence much faster.|
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- ML | Stochastic Gradient Descent (SGD)
- ML | Mini-Batch Gradient Descent with Python
- Difference between Recursive Predictive Descent Parser and Non-Recursive Predictive Descent Parser
- Difference between Gradient descent and Normal equation
- Gradient Descent algorithm and its variants
- ML | T-distributed Stochastic Neighbor Embedding (t-SNE) Algorithm
- Optimization techniques for Gradient Descent
- Gradient Descent in Linear Regression
- Difference between Batch Processing and Real Time Processing System
- ML | Mini Batch K-means clustering algorithm
- Multivariate Optimization - Gradient and Hessian
- ML | Momentum-based Gradient Optimizer introduction
- ML | XGBoost (eXtreme Gradient Boosting)
- LightGBM (Light Gradient Boosting Machine)
- ML - Gradient Boosting
- Difference and Similarities between PHP and C
- Difference between Stop and Wait protocol and Sliding Window protocol
- Similarities and Difference between Java and C++
- Difference between Yaacomo and and XAP
- Difference between VoIP and and POTS
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.