Prerequisite: Optimization techniques in Gradient Descent
Gradient Descent is applicable in the scenarios where the function is easily differentiable with respect to the parameters used in the network. It is easy to minimize continuous function than minimizing discrete functions. The weight update is performed after one epoch, where one epoch represents running through an entire dataset. This technique produces satisfactory results but it deteriorates if the training dataset size becomes large and does not converge well. It also may not lead to global minimum in case of the existence of multiple local minima.
Stochastic gradient descent overcomes this drawback by randomly selecting data samples and updating the parameters based on the cost function. Additionally, it converges faster than regular gradient descent and saves memory by not accumulating the intermediate weights.
Adaptive Moment Estimation (ADAM) facilitates computation of learning rates for each parameter using first and second moment of gradient.
Being computationally efficient, ADAM requires less memory and outperforms on large datasets. It require p2, q2, t to be initialized to 0, where p0 corresponds to 1st moment vector i.e. mean, q0 corresponds to 2nd moment vector i.e. uncentered variance and t represents timestep.
While considering ƒ(w) to be the stochastic objective function with parameters w, proposed values of parameters in ADAM, are as follows:
α = 0.001, m1=0.9, m2=0.999, ϵ = 10-8.
Another major advantage discussed in the study of ADAM is that the updation of parameter is completely invariant to gradient rescaling, the algorithm will converge even if objective function changes with time. The drawback of this particular technique is that it requires computation of second-order derivative which results in increased cost.
The algorithm of ADAM has been briefly mentioned below –
- Optimization techniques for Gradient Descent
- Estimation of Variable | set 1
- Estimation of Variable | set 2
- Object Detection vs Object Recognition vs Image Segmentation
- VGG-16 | CNN model
- Selective Search for Object Detection | R-CNN
- Impact of AI and ML On Warfare Techniques
- Occam's razor
- AI | The Wumpus World Description
- Python | Titanic Data EDA using Seaborn
- Introduction to Ontologies
- Python | Face recognition using GUI
- Machine Learning and Data Science
- How Machine Learning Will Change the World?
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.