**LSTM (Long short term Memory ) **is a type of RNN(Recurrent neural network), which is a famous deep learning algorithm that is well suited for making predictions and classification with a flavour of the time. In this article, we will derive the algorithm backpropagation through time and find the gradient value for all the weights at a particular timestamp.

As the name suggests backpropagation through time is similar to backpropagation in DNN(deep neural network) but due to the dependency of time in RNN and LSTM, we will have to apply the chain rule with time dependency.

Let the input at time t in the LSTM cell be x

_{t}, the cell state from time t-1 and t be c

_{t-1 }and c

_{t}and the output for time t-1 and t be h

_{t-1 }and h

_{t }. The initial value of c

_{t }and h

_{t }at t = 0 will be zero.

**Step 1 :** Initialization of the weights .

Weights for different gates are :Input gate :w_{xi}, w_{xg}, b_{i}, w_{hj}, w_{g }, b_{g}Forget gate :w_{xf}, b_{f}, w_{hf}Output gate :w_{xo}, b_{o}, w_{ho}

**Step 2 : **Passing through different gates .

Inputs: x_{t }and h_{t-i }, c_{t-1 }are given to the LSTM cellPassing through input gate:Z_{g }= w_{xg }*x + w_{hg }* h_{t-1 }+ b_{g}g = tanh(Z_{g}) Z_{j }= w_{xi }* x + w_{hi }* h_{t-1 }+ b_{i}i = sigmoid(Z_{i}) Input_gate_out = g*iPassing through forget gate:Z_{f }= w_{xf }* x + w_{hf }*h_{t-1 }+ b_{f}f = sigmoid(Z_{f}) Forget_gate_out = fPassing through the output gate:Z_{o }= w_{xo}*x + w_{ho}* h_{t-1 }+ b_{o}o = sigmoid(z_{O}) Out_gate_out = o

**Step 3 :** Calculating the output h_{t} and current cell state c_{t.}

Calculating the current cell state c_{t}: c_{t = }(c_{t-1 }* forget_gate_out) + input_gate_outCalculating the output gate ht:h_{t}=out_gate_out * tanh(ct)

**Step 4 :** Calculating the gradient through back propagation through time at time stamp t using chain rule.

Let the gradient pass down by the above cell be:E_delta_{ }= dE/dh_{t}If we are using MSE (mean square error)for error then, E_delta=(y-h(x)) Here y is the orignal value and h(x) is the predicted value._{ }_{ }Gradient with respect to output gatedE/do = (dE/dh_{t }) * (dh_{t }/do) = E_delta * ( dh_{t}/ do) dE/do = E_delta * tanh(c_{t})Gradient with respect to cdE/dc_{t}_{t}= (dE / dh_{t })*(dh_{t }/dc_{t})= E_delta *(dh_{t }/dc_{t}) dE/dc_{t }= E_delta * o * (1-tanh^{2 }(c_{t}))Gradient with respect to input gate dE/di, dE/dgdE/di = (dE/di ) * (dc_{t}/ di) dE/di = E_delta * o * (1-tanh^{2 }(c_{t})) * g Similarly, dE/dg = E_delta * o * (1-tanh^{2 }(c_{t})) * iGradient with respect to forget gatedE/df = E_delta * (dE/dc_{t}) * (dc_{t}/ dt) t dE/df = E_delta * o * (1-tanh^{2 }(c_{t})) * c_{t-1 }Gradient with respect to cdE/dc_{t-1}_{t}= E_delta * (dE/dc_{t}) * (dc_{t}/ dc_{t-1}) dE/dc_{t}= E_delta * o * (1-tanh^{2 }(c_{t})) * fGradient with respect to output gate weights:dE/dw_{xo}= dE/d_{o}*(d_{o}/dw_{xo}) = E_delta * tanh(c_{t}) * sigmoid(z_{o}) * (1-sigmoid(z_{o}) * x_{t}dE/dw_{ho}= dE/do *(do/dw_{ho}) = E_delta * tanh(c_{t}) * sigmoid(z_{o}) * (1-sigmoid(z_{o}) * h_{t-1}dE/db_{o}= dE/do *(do/db_{o}) = E_delta * tanh(c_{t}) * sigmoid(z_{o}) * (1-sigmoid(z_{o})Gradient with respect to forget gate weights:dE/dw_{xf }= dE/df *(df/dw_{xf}) = E_delta * o * (1-tanh^{2}(c_{t})) * c_{t-1}* sigmoid(z_{f}) * (1-sigmoid(z_{f}) * x_{t}dE/dw_{hf}= dE/df *(df/dw_{hf}) = E_delta * o * (1-tanh^{2}(c_{t})) * c_{t-1}* sigmoid(z_{f}) * (1-sigmoid(z_{f}) * h_{t-1}dE/db_{o}= dE/df *(df/db_{o}) = E_delta * o * (1-tanh^{2}(c_{t})) * c_{t-1}* sigmoid(z_{f}) * (1-sigmoid(z_{f})Gradient with respect to input gate weights:dE/dw_{xi}= dE/di *(di/dw_{xi}) = E_delta * o * (1-tanh^{2}(c_{t})) * g * sigmoid(z_{i}) * (1-sigmoid(z_{i}) * x_{t}dE/dw_{hi}= dE/di *(di/dw_{hi}) = E_delta * o * (1-tanh^{2 }(c_{t})) * g * sigmoid(z_{i}) * (1-sigmoid(z_{i}) * h_{t-1}dE/db_{i}= dE/di *(di/db_{i}) = E_delta * o * (1-tanh^{2}(c_{t})) * g * sigmoid(z_{i}) * (1-sigmoid(z_{i}) dE/dw_{xg}= dE/dg *(dg/dw_{xg}) = E_delta * o * (1-tanh^{2}(c_{t})) * i * (1?tanh^{2}(z_{g}))*x_{t}dE/dw_{hg}= dE/dg *(dg/dw_{hg}) = E_delta * o * (1-tanh^{2 }(c_{t})) * i * (1?tanh^{2}(z_{g}))*h_{t-1}dE/db_{g}= dE/dg *(dg/db_{g}) = E_delta * o * (1-tanh^{2}(c_{t})) * i * (1?tanh^{2}(z_{g}))

Finally the gradients associated with the weights are,

Using all gradient, we can easily update the weights associated with input gate, output gate, and forget gate

Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the **DSA Self Paced Course** at a student-friendly price and become industry ready.

## Recommended Posts:

- Back Propagation through time - RNN
- Deep Neural net with forward and back propagation from scratch - Python
- Understanding of LSTM Networks
- Affinity Propagation in ML | To find the number of clusters
- Minimum steps to come back to starting point in a circular tour
- In how many ways the ball will come back to the first boy after N turns
- Number of ways in which an item returns back to its initial position in N swaps in array of size K
- Maximum distinct lines passing through a single point
- Determine the number of squares of unit area that a given line will pass through.
- Equation of straight line passing through a given point which bisects it into two equal line segments
- Length of the chord the circle if length of the another chord which is equally inclined through the diameter is given
- Maximum number of line intersections formed through intersection of N planes
- Find X and Y intercepts of a line passing through the given points
- Find if there exists multiple ways to draw line through (x, y) to cut rectangle in equal halfs
- Generate an Array such with elements maximized through swapping bits
- Find the equation of plane which passes through two points and parallel to a given axis
- Minimum time to reach a point with +t and -t moves at time t
- Space and time efficient Binomial Coefficient
- To Generate a One Time Password or Unique Identification URL
- MakeMyTrip Interview Experience | Set 13 (On-Campus for Full Time)

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.