Skip to content

Conversation

sergiopaniego
Copy link
Member

What does this PR do?

Add Efficient Online Training with GRPO and vLLM in TRL recipe to showcase online training possibilities in TRL.
This recipe is a modification of Post training an LLM for reasoning with GRPO in TRL and I aim to include it in the vLLM docs here

Who can review?

Feel free to tag members/contributors who may be interested in your PR.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sergiopaniego sergiopaniego marked this pull request as ready for review October 2, 2025 16:25
@sergiopaniego
Copy link
Member Author

@qgallouedec, in case you want to take a look. I still need to run the full training to get the final results, but the key takeaways are already visible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants