Long Short Term Memory is a kind of recurrent neural network. In RNN output from the last step is fed as input in the current step. LSTM was desgined by Hochreiter & Schmidhuber. It tackled the problem of long-term dependencies of RNN in which the RNN cannot predict the word stored in the long term memory but can give more accurate predictions from the recent information. As the gap length increases RNN does not give efficent performance. LSTM can by default retain the information for long period of time. It is used for processing, predicting and classifying on the basis of time series data.
Structure Of LSTM:
LSTM has a chain structure that contains four neural networks and different memory blocks called cells.
Information is retained by the cells and the memory manipulations are done by the gates. There are three gates –
- Forget Gate: The information that no longer useful in the cell state is removed with the forget gate. Two inputs x_t (input at the particular time) and h_t-1 (previous cell output) are fed to the gate and multiplied with weight matrices followed by the addition of bias. The resultant is passed through an activation function which gives a binary output. If for a particular cell state the output is 0, the piece of information is forgotten and for the output 1, the information is retained for the future use.
- Input gate: Addition of useful information to the cell state is done by input gate. First, the information is regulated using the sigmoid function and filter the values to be remembered similar to the forget gate using inputs h_t-1 and x_t. Then, a vector is created using tanh function that gives output from -1 to +1, which contains all the possible values from h_t-1 and x_t. Atlast, the values of the vector and the regulated values are multiplied to obtain the useful information
- Output gate: The task of extracting useful information from the current cell state to be presented as an output is done by output gate. First, a vector is generated by applying tanh function on the cell. Then, the information is regulated using the sigmoid function and filter the values to be remembered using inputs h_t-1 and x_t. Atlast, the values of the vector and the regulated values are multiplied to be sent as an output and input to the next cell.
Some of the famous applications of LSTM includes:
- Language Modelling
- Machine Translation
- Image Captioning
- Handwriting generation
- Question Answering Chatbots
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- Introduction to Multi-Task Learning(MTL) for Deep Learning
- Long Short Term Memory Networks Explanation
- Text Generation using Recurrent Long Short Term Memory Network
- Artificial intelligence vs Machine Learning vs Deep Learning
- Difference Between Artificial Intelligence vs Machine Learning vs Deep Learning
- Need of Data Structures and Algorithms for Deep Learning and Machine Learning
- Introduction to Deep Learning
- Deep Learning with PyTorch | An Introduction
- ML | Natural Language Processing using Deep Learning
- Deep Q-Learning
- Implementing Deep Q-Learning using Tensorflow
- Differential Privacy and Deep Learning
- Human Activity Recognition - Using Deep Learning Model
- ML - List of Deep Learning Layers
- Residual Networks (ResNet) - Deep Learning
- ML - Saving a Deep Learning model in Keras
- DLSS - Deep Learning Super Sampling
- Computational Graphs in Deep Learning
- Image Caption Generator using Deep Learning on Flickr8K dataset
- Indroduction in deep learning with julia
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.