Skip to content

Conversation

HenriDeh
Copy link
Member

@HenriDeh HenriDeh commented Apr 5, 2022

Following our discussion of yesterday at #613, I'm creating this draft PR to show how I went on implementing the Retrace Algorithm. More precisely, I wanted to implement Retrace as a plug-in that can optionally be used by an algorithm (I have #604 in mind mainly) but such a level of abstraction was not implemented yet. So here's my attempt. I assume that this may not be what you have in mind @findmyway but it can be a useful discussion even if this is never merged.

The main idea is that Retrace is just a different way than TD(n) to compute update targets for a QNetwork. So I created a QNetwork abstraction that can be called to be updated given a batch of action, states and targets : update!(qnetwork::AbstractQNetwork, states, actions, targets).
The main goal here was the introduction of the function q_targets, called by update!.

Now, why did I do this? Because then I could implement retrace in way that is reusable by ab algorithm that uses the QNetwork abstraction. RetraceTrajectory is an extension of a Trajectory, using a new type allows to overload q_targets. So an algorithm that uses a RetraceTrajectory will automatically use this target to update its AbstractQNetwork.

I'll leave this PR as a draft and only use it locally for now. When we move to 0.11 I'll adapt to make Retrace work with it.

@harwiltz
Copy link
Contributor

harwiltz commented Apr 5, 2022

I like where this is going. Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates (not necessarily in this PR, just in general)? Also, can you please add a reference to the Retrace paper in a comment somewhere? I believe I read this paper a while ago, but it would be nice to have a reference for convenience.

@HenriDeh
Copy link
Member Author

HenriDeh commented Apr 5, 2022

Would it make sense to make something like QNetworkWithTarget <: AbstractQNetwork to abstract away the target network and soft updates

Yes, I think it's a choice that would make sense. Now that I think of it, it would make more sense to go this way and even define two QNetworkWithTarget, one that uses polyak averaging and one that copies the weights every k updates, since both approaches are common.

For the ref, you can find the paper here.
Vtrace and Impala would be nice additions in the future (https://arxiv.org/abs/1802.01561) too. These pertain a lot to distributed RL that I think is about to undergo major changes in this package soon.

@findmyway
Copy link
Member

Thanks! This is very helpful!

@HenriDeh HenriDeh closed this by deleting the head repository Mar 15, 2023
@HenriDeh HenriDeh mentioned this pull request Aug 11, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants