Add Retrace and a QNetwork abstraction #615

HenriDeh · 2022-04-05T09:48:55Z

Following our discussion of yesterday at #613, I'm creating this draft PR to show how I went on implementing the Retrace Algorithm. More precisely, I wanted to implement Retrace as a plug-in that can optionally be used by an algorithm (I have #604 in mind mainly) but such a level of abstraction was not implemented yet. So here's my attempt. I assume that this may not be what you have in mind @findmyway but it can be a useful discussion even if this is never merged.

The main idea is that Retrace is just a different way than TD(n) to compute update targets for a QNetwork. So I created a QNetwork abstraction that can be called to be updated given a batch of action, states and targets : update!(qnetwork::AbstractQNetwork, states, actions, targets).
The main goal here was the introduction of the function q_targets, called by update!.

Now, why did I do this? Because then I could implement retrace in way that is reusable by ab algorithm that uses the QNetwork abstraction. RetraceTrajectory is an extension of a Trajectory, using a new type allows to overload q_targets. So an algorithm that uses a RetraceTrajectory will automatically use this target to update its AbstractQNetwork.

I'll leave this PR as a draft and only use it locally for now. When we move to 0.11 I'll adapt to make Retrace work with it.

harwiltz · 2022-04-05T12:54:33Z

I like where this is going. Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates (not necessarily in this PR, just in general)? Also, can you please add a reference to the Retrace paper in a comment somewhere? I believe I read this paper a while ago, but it would be nice to have a reference for convenience.

HenriDeh · 2022-04-05T13:01:13Z

Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates

Yes, I think it's a choice that would make sense. Now that I think of it, it would make more sense to go this way and even define two QNetworkWithTarget, one that uses polyak averaging and one that copies the weights every k updates, since both approaches are common.

For the ref, you can find the paper here.
Vtrace and Impala would be nice additions in the future (https://arxiv.org/abs/1802.01561) too. These pertain a lot to distributed RL that I think is about to undergo major changes in this package soon.

findmyway · 2022-04-05T14:33:47Z

Thanks! This is very helpful!

HenriDeh added 4 commits April 5, 2022 11:13

add a caller to NStepBatchSampler

91c8e8c

Add a QNetwork abstraction

603459b

add a Retrace trajectory

8600ed3

comments

23bcf66

HenriDeh closed this by deleting the head repository Mar 15, 2023

HenriDeh mentioned this pull request Aug 11, 2023

Missing features in RLCore #961

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Retrace and a QNetwork abstraction #615

Add Retrace and a QNetwork abstraction #615

Uh oh!

HenriDeh commented Apr 5, 2022

Uh oh!

harwiltz commented Apr 5, 2022

Uh oh!

HenriDeh commented Apr 5, 2022 •

edited

Loading

Uh oh!

findmyway commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add Retrace and a QNetwork abstraction #615

Add Retrace and a QNetwork abstraction #615

Uh oh!

Conversation

HenriDeh commented Apr 5, 2022

Uh oh!

harwiltz commented Apr 5, 2022

Uh oh!

HenriDeh commented Apr 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

findmyway commented Apr 5, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HenriDeh commented Apr 5, 2022 •

edited

Loading