Related Articles

# Q-learning Mathematical Background

• Last Updated : 18 Jun, 2019

Prerequisites: Q-Learning.

In the following derivations, the symbols defined as in the prerequisite article will be used.
The Q-learning technique is based on the Bellman Equation. where,
E : Expectation
t+1 : next state : discount factor

Rephrasing the above equation in the form of Q-Value:-  The optimal Q-value is given by Policy Iteration: It is the process of determining the optimal policy for the model and consists of the following two steps:-

1. Policy Evaluation: This process estimates the value of the long-term reward function with the greedy policy obtained from the last Policy Improvement step.
2. Policy Improvement: This process updates the policy with the action that maximizes V for each of the state. This process is repeated until convergence is achieved.

Steps Involved:-

• Initialization: = any real random number = any A(s) arbitrarily chosen

• Policy Evaluation: while( )
{
for each s in S
{   }
} • Policy Improvement: while(true)
for each s in S
{  if( ) if( )
break from both loops
}
return V, • Value Iteration: This process updates the function V according to the Optimal Bellman Equation. Working Steps:

• Initialization: Initialize array V by any random real number.
• Computing the optimal value: while( )
{
for each s in S
{   }
} return Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up