Skip to content

Conversation

@kaixuanliu
Copy link
Contributor

When I run the example following README part:

python examples/scripts/ppo/ppo_tldr.py \
    --dataset_name trl-internal-testing/tldr-preference-sft-trl-style \
    --dataset_test_split validation \
    --learning_rate 3e-6 \
    --output_dir pythia-1b-deduped-tldr-preference-sft-trl-style-ppo \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 64 \
    --total_episodes 30000 \
    --model_name_or_path EleutherAI/pythia-1b-deduped \
    --sft_model_path cleanrl/EleutherAI_pythia-1b-deduped__sft__tldr \
    --reward_model_path cleanrl/EleutherAI_pythia-1b-deduped__reward__tldr \
    --missing_eos_penalty 1.0 \
    --stop_token eos \
    --response_length 53 \
    --eval_strategy steps \
    --eval_steps 100

it returns error : ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed!. Seems the model EleutherAI/pythia-1b-deduped does not have its chat template. This PR solves this bug.

@qgallouedec
Copy link
Member

thanks for reporting! I think that instead, you could simply replace trl-internal-testing/tldr-preference-sft-trl-style by trl-lib/tldr

@kaixuanliu
Copy link
Contributor Author

@qgallouedec Hi, when I replace the dataset to trl-lib/tldr, it also has this problem. I am afraid it is related with the model's tokenizer, just replace the dataset cannot help it.

@kaixuanliu
Copy link
Contributor Author

@kashif Hi, can you help review? thx!

@qgallouedec qgallouedec mentioned this pull request Nov 21, 2025
@qgallouedec
Copy link
Member

thanks again, what I meant here was to use a non-conversational dataset instead. I tried myself in #4556 and it seems to work. I'll close this PR then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants