fix tokenize bug for ppo_tldr example #4520

kaixuanliu · 2025-11-13T14:43:14Z

When I run the example following README part:

python examples/scripts/ppo/ppo_tldr.py \
    --dataset_name trl-internal-testing/tldr-preference-sft-trl-style \
    --dataset_test_split validation \
    --learning_rate 3e-6 \
    --output_dir pythia-1b-deduped-tldr-preference-sft-trl-style-ppo \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 64 \
    --total_episodes 30000 \
    --model_name_or_path EleutherAI/pythia-1b-deduped \
    --sft_model_path cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr \
    --reward_model_path cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr \
    --missing_eos_penalty 1.0 \
    --stop_token eos \
    --response_length 53 \
    --eval_strategy steps \
    --eval_steps 100

it returns error : ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed!. Seems the model EleutherAI/pythia-1b-deduped does not have its chat template. This PR solves this bug.

Signed-off-by: Liu, Kaixuan <[email protected]>

qgallouedec · 2025-11-13T17:16:30Z

thanks for reporting! I think that instead, you could simply replace trl-internal-testing/tldr-preference-sft-trl-style by trl-lib/tldr

kaixuanliu · 2025-11-14T01:55:45Z

@qgallouedec Hi, when I replace the dataset to trl-lib/tldr, it also has this problem. I am afraid it is related with the model's tokenizer, just replace the dataset cannot help it.

kaixuanliu · 2025-11-20T01:10:34Z

@kashif Hi, can you help review? thx!

qgallouedec · 2025-11-21T23:58:03Z

thanks again, what I meant here was to use a non-conversational dataset instead. I tried myself in #4556 and it seems to work. I'll close this PR then

fix tokenize bug for ppo_tldr example

4d041cc

Signed-off-by: Liu, Kaixuan <[email protected]>

qgallouedec mentioned this pull request Nov 21, 2025

Fix PPO example #4556

Open

qgallouedec closed this Nov 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix tokenize bug for ppo_tldr example #4520

fix tokenize bug for ppo_tldr example #4520

Uh oh!

kaixuanliu commented Nov 13, 2025

Uh oh!

qgallouedec commented Nov 13, 2025

Uh oh!

kaixuanliu commented Nov 14, 2025

Uh oh!

kaixuanliu commented Nov 20, 2025

Uh oh!

qgallouedec commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix tokenize bug for ppo_tldr example #4520

fix tokenize bug for ppo_tldr example #4520

Uh oh!

Conversation

kaixuanliu commented Nov 13, 2025

Uh oh!

qgallouedec commented Nov 13, 2025

Uh oh!

kaixuanliu commented Nov 14, 2025

Uh oh!

kaixuanliu commented Nov 20, 2025

Uh oh!

qgallouedec commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants