fix: add missing bos_token to example templates #11432

toslunar · 2024-12-23T12:12:11Z

This PR updates chat template examples to align with the latest (>=0.4.3) /v1/chat/completions spec on BOS.

Some chat templates don't contain bos_token, especially when it's written before v0.4.3. According to the excellent investigation #9519, the chat serving does not add BOS again, while awkwardness remains in the offline chat interface (to be fixed).

github-actions · 2024-12-23T12:12:22Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: Toshiki Kataoka <[email protected]>

DarkLight1337 · 2024-12-23T13:48:43Z

Thanks for fixing, can you add some unit tests to ensure that the BOS token isn't added twice in this case?

toslunar · 2024-12-24T01:21:26Z

Added a test.

Is it OK to place it to a new directory (tests/examples_tests) because it seems the first time to test files in examples? I named it examples_tests because it shouldn't be confused with data directories like tests/data and tests/prompts.

Signed-off-by: Toshiki Kataoka <[email protected]>

DarkLight1337 · 2024-12-24T03:35:33Z

I would just name the directory tests/examples. Can you add the tests to CI by editing this file?

Signed-off-by: Toshiki Kataoka <[email protected]>

.buildkite/test-pipeline.yaml

Signed-off-by: Toshiki Kataoka <[email protected]>

DarkLight1337 · 2024-12-25T07:18:12Z

Please update the failing tests to consider the BOS token.

toslunar · 2024-12-25T07:39:15Z

Oops, this is unexpected to me. I'll fix the enumeration of Jinja files.

https://buildkite.com/vllm/fastcheck/builds/10539#0193fc47-aa1b-4af4-8c53-a27c0af8deb1

[2024-12-25T06:56:26Z] ============================= test session starts ==============================
[2024-12-25T06:56:26Z] platform linux -- Python 3.12.8, pytest-8.3.3, pluggy-1.5.0 -- /usr/bin/python3
[2024-12-25T06:56:26Z] cachedir: .pytest_cache
[2024-12-25T06:56:26Z] rootdir: /vllm-workspace
[2024-12-25T06:56:26Z] configfile: pyproject.toml
[2024-12-25T06:56:26Z] plugins: anyio-4.6.2.post1, buildkite-test-collector-0.1.9, asyncio-0.24.0, forked-1.6.0, rerunfailures-14.0, shard-0.1.2
[2024-12-25T06:56:26Z] asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 2 items                                                              
[2024-12-25T06:56:26Z] Running 2 items in this shard: tests/examples/test_jinja.py::test_bos[1-path0], tests/examples/test_jinja.py::test_bos[3-path0]
[2024-12-25T06:56:26Z]
[2024-12-25T06:56:26Z] examples/test_jinja.py::test_bos[1-path0] SKIPPED (got empty paramet...)
[2024-12-25T06:56:26Z] examples/test_jinja.py::test_bos[3-path0] SKIPPED (got empty paramet...)
[2024-12-25T06:56:26Z]
[2024-12-25T06:56:26Z] ============================== 2 skipped in 0.03s ==============================

Signed-off-by: Toshiki Kataoka <[email protected]>

tests/examples/test_jinja.py

Signed-off-by: Toshiki Kataoka <[email protected]>

DarkLight1337 · 2024-12-25T16:15:27Z

I think you also need to update the entrypoints tests to consider the BOS token.

DarkLight1337 · 2024-12-25T16:16:57Z

Actually now that I think more about it... wouldn't it be better to instead update LLM.chat to automatically add BOS tokens? That way our chat templates would remain consistent with HF.

Signed-off-by: Toshiki Kataoka <[email protected]>

mergify · 2024-12-25T17:00:55Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @toslunar.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Toshiki Kataoka <[email protected]>

toslunar · 2024-12-25T17:58:17Z

I suppose this PR could be incremental and it only resolves the inconsistency among example templates.

Actually now that I think more about it... wouldn't it be better to instead update LLM.chat to automatically add BOS tokens? That way our chat templates would remain consistent with HF.

Sorry. I didn't get the concrete suggestion. Do you mean making LLM.chat add a BOS token if there's not (for backward compatibility)?

non-chat: HF's tokenizer __call__ defaults to add_special_tokens=True.
chat: HF's apply_chat_template forces add_special_tokens=False. I use tokenize=False when I need BOS.

DarkLight1337 · 2024-12-26T03:33:57Z

I suppose this PR could be incremental and it only resolves the inconsistency among example templates.

Actually now that I think more about it... wouldn't it be better to instead update LLM.chat to automatically add BOS tokens? That way our chat templates would remain consistent with HF.

Sorry. I didn't get the concrete suggestion. Do you mean making LLM.chat add a BOS token if there's not (for backward compatibility)?

non-chat: HF's tokenizer __call__ defaults to add_special_tokens=True.

chat: HF's apply_chat_template forces add_special_tokens=False. I use tokenize=False when I need BOS.

I mean that we can remove the BOS token before passing the prompt text into LLM.generate inside LLM.chat so that we don't get duplicate BOS tokens if the chat template already has one. In other words, we update LLM.chat to have similar behavior as online inference, rather than changing the chat templates themselves.

BREAKING CHANGE: chat_template usually needs BOS now. Signed-off-by: Toshiki Kataoka <[email protected]>

toslunar · 2024-12-26T12:37:55Z

Let me run tests with apply_hf_chat_template(..., tokenize=True).

Signed-off-by: Toshiki Kataoka <[email protected]>

toslunar · 2025-01-07T01:43:52Z

It seems the current unittest doesn't have opinion on BOS in LLM.chat because the CI passes. I'm sorry to say that neither am I interested in the specification of LLM.chat enough to become a decision-maker.

My concerns are what algorithm to decide the need of BOS for backward compatibility and how to notify the choice to users. I'd ask for comments. My random thoughts are below.

For users who trust the HF chat behavior, I'd like to see at least a warning message if vLLM encodes inputs differently than transformers.PreTrainedTokenizerBase.apply_chat_template, whose default is tokenize=True. Besides the current LLM.chat is non-compatible, a possible update with the BOS fallback could conditionally emit "BOS is automatically added for backward compatibility because the chat template didn't produce BOS at the beginning." In my experience, I compared settings to give a system prompt after or before BOS, where such fallback will change the result.

For users who trust vLLM's existing behavior, fallback BOS is necessary and I'd like to be notified on release notes. I suppose it's too noisy to emit a warning message if the fallback occurs and emit another warning message otherwise. In another experience of mine, I had to rerun a bunch of API chat generation because I naively updated the vLLM version and the change by #4688 was almost silent.

DarkLight1337 · 2025-01-07T03:34:22Z

Personally, I prefer aligning with HF's behavior since that's what most people are used to. If we find that BOS would be missing using the original code, we can emit a warning + release notes as you stated.

@robertgshaw2-neuralmagic @njhill any thoughts on this? To unify the tokenization code, I think we should actually tokenize the prompt inside Preprocessor instead of OpenAIServing according to the requirements of request.max_completion_tokens/request.max_tokens, and a flag to truncate the tokens instead of raising an error if max_model_len is exceeded.

ywang96 · 2025-01-09T08:08:58Z

It seems the current unittest doesn't have opinion on BOS in LLM.chat because the CI passes. I'm sorry to say that neither am I interested in the specification of LLM.chat enough to become a decision-maker.

My concerns are what algorithm to decide the need of BOS for backward compatibility and how to notify the choice to users. I'd ask for comments. My random thoughts are below.

For users who trust the HF chat behavior, I'd like to see at least a warning message if vLLM encodes inputs differently than transformers.PreTrainedTokenizerBase.apply_chat_template, whose default is tokenize=True. Besides the current LLM.chat is non-compatible, a possible update with the BOS fallback could conditionally emit "BOS is automatically added for backward compatibility because the chat template didn't produce BOS at the beginning." In my experience, I compared settings to give a system prompt after or before BOS, where such fallback will change the result.

For users who trust vLLM's existing behavior, fallback BOS is necessary and I'd like to be notified on release notes. I suppose it's too noisy to emit a warning message if the fallback occurs and emit another warning message otherwise. In another experience of mine, I had to rerun a bunch of API chat generation because I naively updated the vLLM version and the change by #4688 was almost silent.

@toslunar To clarify, are you talking about LLM.chat or the OpenAI chat API?

tranhd95 · 2025-01-16T11:48:38Z

Hi, I would appreciate if there would be at least add_special_tokens parameter for LLM.chat (or SamplingParams) so we don't have to manually modify the chat templates. Adding special tokens to the template seems like a preferred way according to HF's chat template docs.

This inconsistency between LLM.chat and vLLM's OpenAI Chat Completions API caught me off guard. And I personally think that there's bunch of people using suboptimal Llamas (and other LLMs with special tokens) because of duplicated BOS tokens.

Ideally I'd stick to Andrej's suggestions to strictly disable every "convenience" shortcuts that HF tokenizers can do. In this case add_special_tokens=True.

mergify · 2025-02-28T08:55:11Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @toslunar.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

github-actions · 2025-07-26T02:29:03Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

github-actions · 2025-08-26T02:13:34Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!

fix: add missing bos_token to example templates

1a7f9e3

Signed-off-by: Toshiki Kataoka <[email protected]>

toslunar force-pushed the example-jinja-bos branch from 43f2c32 to 1a7f9e3 Compare December 23, 2024 12:13

DarkLight1337 requested a review from njhill December 23, 2024 13:46

test: BOS occurrence in formatted conversation

263ab89

Signed-off-by: Toshiki Kataoka <[email protected]>

toslunar force-pushed the example-jinja-bos branch from 13f56c1 to 263ab89 Compare December 24, 2024 01:22

toslunar added 2 commits December 24, 2024 20:29

chore: rename dir

774a24f

Signed-off-by: Toshiki Kataoka <[email protected]>

ci: add non-e2e examples test

506de06

Signed-off-by: Toshiki Kataoka <[email protected]>

mergify bot added the ci/build label Dec 24, 2024

DarkLight1337 reviewed Dec 24, 2024

View reviewed changes

.buildkite/test-pipeline.yaml Outdated Show resolved Hide resolved

ci: merge examples tests

781237d

Signed-off-by: Toshiki Kataoka <[email protected]>

test: bugfix jinja_paths collection

cea1d41

Signed-off-by: Toshiki Kataoka <[email protected]>

DarkLight1337 reviewed Dec 25, 2024

View reviewed changes

tests/examples/test_jinja.py Outdated Show resolved Hide resolved

test: use VLLM_PATH

0d9fb7c

Signed-off-by: Toshiki Kataoka <[email protected]>

test: update expected output

b40c526

Signed-off-by: Toshiki Kataoka <[email protected]>

toslunar requested review from robertgshaw2-redhat and simon-mo as code owners December 25, 2024 17:00

mergify bot added the needs-rebase label Dec 25, 2024

Merge branch 'main' into example-jinja-bos

f49990c

Signed-off-by: Toshiki Kataoka <[email protected]>

mergify bot removed the needs-rebase label Dec 25, 2024

chore: format

f58717f

Signed-off-by: Toshiki Kataoka <[email protected]>

fix!: make LLM.chat tokenize

dd2883a

BREAKING CHANGE: chat_template usually needs BOS now. Signed-off-by: Toshiki Kataoka <[email protected]>

mergify bot added the frontend label Dec 26, 2024

Merge branch 'main' into example-jinja-bos

30ae8ae

Signed-off-by: Toshiki Kataoka <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Feb 28, 2025

mergify bot added the needs-rebase label Feb 28, 2025

DarkLight1337 mentioned this pull request Mar 28, 2025

[Bugfix] llm.chat bos token duplicate #15695

Closed

njhill mentioned this pull request Apr 5, 2025

[BugFix][Frontend] Fix LLM.chat() tokenization #16081

Merged

mergify bot added the tool-calling label Apr 25, 2025

russellb added this to Tool Calling Apr 29, 2025

github-actions bot added the stale Over 90 days of inactivity label Jul 26, 2025

github-actions bot closed this Aug 26, 2025

github-project-automation bot moved this to Done in Tool Calling Aug 26, 2025

Uh oh!

fix: add missing bos_token to example templates #11432

fix: add missing bos_token to example templates #11432

Uh oh!

Conversation

toslunar commented Dec 23, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 23, 2024

Uh oh!

DarkLight1337 commented Dec 23, 2024

Uh oh!

toslunar commented Dec 24, 2024

Uh oh!

DarkLight1337 commented Dec 24, 2024

Uh oh!

Uh oh!

DarkLight1337 commented Dec 25, 2024

Uh oh!

toslunar commented Dec 25, 2024

Uh oh!

Uh oh!

DarkLight1337 commented Dec 25, 2024

Uh oh!

DarkLight1337 commented Dec 25, 2024

Uh oh!

mergify bot commented Dec 25, 2024

Uh oh!

toslunar commented Dec 25, 2024

Uh oh!

DarkLight1337 commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toslunar commented Dec 26, 2024

Uh oh!

toslunar commented Jan 7, 2025

Uh oh!

DarkLight1337 commented Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywang96 commented Jan 9, 2025

Uh oh!

tranhd95 commented Jan 16, 2025

Uh oh!

mergify bot commented Feb 28, 2025

Uh oh!

github-actions bot commented Jul 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

toslunar commented Dec 23, 2024 •

edited by github-actions bot

Loading

DarkLight1337 commented Dec 26, 2024 •

edited

Loading

DarkLight1337 commented Jan 7, 2025 •

edited

Loading