q_ret update not used #1

mvindiola1 · 2022-03-10T21:16:22Z

I have enjoyed your really clean implementation of MPO. Thank you for making it available. I was looking at the critic update and think I may have spotted a bug. You update q_ret on line 163 according to retrace but as far as I can see you do not actually use it anywhere. I think you might want to use it recursively on line 161 in place of q_retraces[step + 1].

MPO/mpo.py

Lines 160 to 163 in c84bf23

    
           for step in reversed(range(nsteps)): 
        
               q_ret = reward_batch[step] + self.γ * q_retraces[step + 1] * (1 - done_batch[step + 1]) 
        
               q_retraces[step] = q_ret 
        
               q_ret = (rho_i[step] * (q_retraces[step] - q_i[step])) + val[step]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

q_ret update not used #1

q_ret update not used #1

mvindiola1 commented Mar 10, 2022

q_ret update not used #1

q_ret update not used #1

Comments

mvindiola1 commented Mar 10, 2022