-
Notifications
You must be signed in to change notification settings - Fork 3
Description
This issue is intended to centralize the discussion and documentation of the RAD training process, including pipeline details, loss design, and common debugging tricks.
本 Issue 用于集中讨论和整理 RAD训练过程中的实现细节,包括流程设计、损失计算,以及常见的调试技巧。
-
Recommendation to freeze the normalization layers during RL fine-tuning.
See the reference implementation inrad/utils.py -
Learning rate is crucial for stable training.
A well-chosen learning rate plays a key role in ensuring smooth convergence during RL fine-tuning. If you are confident that the algorithm and implementation are correct, it's often effective to tune the learning rate when the model fails to converge or behaves unstably. Both an excessively large or too small learning rate can significantly impact convergence quality.