GRPO Reward Weight Scheduler

### Feature request

It would be great to support dynamic weights for aggregating rewards -- i.e. different weightings based on how deep into training we have progressed.

### Motivation

There are often rewards that we can use for local updates that don't make sense globally in terms of their magnitude. 

For example, one potential reward is to rank the set of rollouts and assign the #1 ranking a max reward of 1 and the last ranking a min reward of 0. This is useful *locally* when true rewards are sparse, but becomes distracting in the limit of training (the #1 ranking always gets a reward of 1).

It is possible to schedule the reward function itself, but that seems not as clean. Also, the logs for that reward function would be misleading. 

### Your contribution

Should be easy for me to submit a PR for this, but thought it's worth flagging explicitly here for feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO Reward Weight Scheduler #36490

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO Reward Weight Scheduler #36490

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions