Q-learning Mathematical Background

Last Updated : 18 Jun, 2019

Prerequisites: Q-Learning. In the following derivations, the symbols defined as in the prerequisite article will be used. The Q-learning technique is based on the Bellman Equation. $v(s) = E(R_{t+1}+\lambda v(S_{t+1})|S_{t}=s)$ where, E : Expectation t+1 : next state $\lambda$ : discount factor Rephrasing the above equation in the form of Q-Value:- $Q^{\pi}(s,a) = E(r_{t+1}+\lambda r_{t+2}+\lambda ^{2}r_{t+3}+...|S_{t}=s,A_{t}=a)$ $= E_{s'}(r_{t}+\lambda Q^{\pi}(s',a')|S_{t}=s,A_{t}=a)$ The optimal Q-value is given by $Q^{*}(s,a) = E_{s'}(r_{t}+\lambda max_{a'}Q^{*}(s',a')|S_{t}=s,A_{t}=a)$ Policy Iteration: It is the process of determining the optimal policy for the model and consists of the following two steps:-

Policy Evaluation: This process estimates the value of the long-term reward function with the greedy policy obtained from the last Policy Improvement step.
Policy Improvement: This process updates the policy with the action that maximizes V for each of the state. This process is repeated until convergence is achieved.

Steps Involved:-

Initialization: $V(s)$ = any real random number $\pi(s)$ = any A(s) arbitrarily chosen

Policy Evaluation:


while()
{
    for each s in S
    {    
        
        
        
    }
}

Policy Improvement:


while(true)
    for each s in S
    {
        
        
        if()
            
        if()
            break from both loops
    }
return V,

Value Iteration: This process updates the function V according to the Optimal Bellman Equation. $v_{*}(s) = max_{a}E(R_{t+1}+\gamma v_{*}(S_{t+1})|S_{t}=s,A_{t}=a)$

Working Steps:

Initialization: Initialize array V by any random real number.

Computing the optimal value:


while()
{
    for each s in S
    {
        
        
        
    }
}


return

Suggest improvement

Mathematics concept required for Deep Learning

Share your thoughts in the comments

Q-learning Mathematical Background

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?