Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

sergiopaniego · 2025-10-01T16:10:04Z

What does this PR do?

Add Efficient Online Training with GRPO and vLLM in TRL recipe to showcase online training possibilities in TRL.
This recipe is a modification of Post training an LLM for reasoning with GRPO in TRL and I aim to include it in the vLLM docs here

Who can review?

Feel free to tag members/contributors who may be interested in your PR.

review-notebook-app · 2025-10-01T16:10:09Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

HuggingFaceDocBuilderDev · 2025-10-01T16:15:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego · 2025-10-02T16:28:30Z

@qgallouedec, in case you want to take a look. I still need to run the full training to get the final results, but the key takeaways are already visible.

Add Efficient Online Training with GRPO and vLLM in TRL recipe

5926c13

Updated notebook

463f240

sergiopaniego marked this pull request as ready for review October 2, 2025 16:25

sergiopaniego requested review from merveenoyan and stevhliu October 2, 2025 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

sergiopaniego commented Oct 1, 2025

Uh oh!

review-notebook-app bot commented Oct 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

sergiopaniego commented Oct 2, 2025

Uh oh!

Uh oh!

Add Efficient Online Training with GRPO and vLLM in TRL recipe #334

Are you sure you want to change the base?

Add Efficient Online Training with GRPO and vLLM in TRL recipe #334

Conversation

sergiopaniego commented Oct 1, 2025

What does this PR do?

Who can review?

Uh oh!

review-notebook-app bot commented Oct 1, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2025

Uh oh!

sergiopaniego commented Oct 2, 2025

Uh oh!

Uh oh!

Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334

Add `Efficient Online Training with GRPO and vLLM in TRL` recipe #334