Skip to content

RAD Training Details and Implementation Notes #2

@HGao-cv

Description

@HGao-cv

This issue is intended to centralize the discussion and documentation of the RAD training process, including pipeline details, loss design, and common debugging tricks.

本 Issue 用于集中讨论和整理 RAD训练过程中的实现细节,包括流程设计、损失计算,以及常见的调试技巧。

  1. Recommendation to freeze the normalization layers during RL fine-tuning.
    See the reference implementation in rad/utils.py

  2. Learning rate is crucial for stable training.
    A well-chosen learning rate plays a key role in ensuring smooth convergence during RL fine-tuning. If you are confident that the algorithm and implementation are correct, it's often effective to tune the learning rate when the model fails to converge or behaves unstably. Both an excessively large or too small learning rate can significantly impact convergence quality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions