-
Notifications
You must be signed in to change notification settings - Fork 31.2k
Description
Feature request
It would be great to support dynamic weights for aggregating rewards -- i.e. different weightings based on how deep into training we have progressed.
Motivation
There are often rewards that we can use for local updates that don't make sense globally in terms of their magnitude.
For example, one potential reward is to rank the set of rollouts and assign the #1 ranking a max reward of 1 and the last ranking a min reward of 0. This is useful locally when true rewards are sparse, but becomes distracting in the limit of training (the #1 ranking always gets a reward of 1).
It is possible to schedule the reward function itself, but that seems not as clean. Also, the logs for that reward function would be misleading.
Your contribution
Should be easy for me to submit a PR for this, but thought it's worth flagging explicitly here for feedback