-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
fix: add missing bos_token to example templates #11432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Toshiki Kataoka <[email protected]>
43f2c32 to
1a7f9e3
Compare
|
Thanks for fixing, can you add some unit tests to ensure that the BOS token isn't added twice in this case? |
|
Added a test. Is it OK to place it to a new directory ( |
Signed-off-by: Toshiki Kataoka <[email protected]>
13f56c1 to
263ab89
Compare
|
I would just name the directory |
Signed-off-by: Toshiki Kataoka <[email protected]>
Signed-off-by: Toshiki Kataoka <[email protected]>
Signed-off-by: Toshiki Kataoka <[email protected]>
|
Please update the failing tests to consider the BOS token. |
|
Oops, this is unexpected to me. I'll fix the enumeration of Jinja files. https://buildkite.com/vllm/fastcheck/builds/10539#0193fc47-aa1b-4af4-8c53-a27c0af8deb1 |
Signed-off-by: Toshiki Kataoka <[email protected]>
Signed-off-by: Toshiki Kataoka <[email protected]>
|
I think you also need to update the entrypoints tests to consider the BOS token. |
|
Actually now that I think more about it... wouldn't it be better to instead update |
Signed-off-by: Toshiki Kataoka <[email protected]>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Toshiki Kataoka <[email protected]>
Signed-off-by: Toshiki Kataoka <[email protected]>
|
I suppose this PR could be incremental and it only resolves the inconsistency among example templates.
Sorry. I didn't get the concrete suggestion. Do you mean making
|
I mean that we can remove the BOS token before passing the prompt text into |
BREAKING CHANGE: chat_template usually needs BOS now. Signed-off-by: Toshiki Kataoka <[email protected]>
|
Let me run tests with |
Signed-off-by: Toshiki Kataoka <[email protected]>
|
It seems the current unittest doesn't have opinion on BOS in My concerns are what algorithm to decide the need of BOS for backward compatibility and how to notify the choice to users. I'd ask for comments. My random thoughts are below. For users who trust the HF chat behavior, I'd like to see at least a warning message if vLLM encodes inputs differently than For users who trust vLLM's existing behavior, fallback BOS is necessary and I'd like to be notified on release notes. I suppose it's too noisy to emit a warning message if the fallback occurs and emit another warning message otherwise. In another experience of mine, I had to rerun a bunch of API chat generation because I naively updated the vLLM version and the change by #4688 was almost silent. |
|
Personally, I prefer aligning with HF's behavior since that's what most people are used to. If we find that BOS would be missing using the original code, we can emit a warning + release notes as you stated. @robertgshaw2-neuralmagic @njhill any thoughts on this? To unify the tokenization code, I think we should actually tokenize the prompt inside |
@toslunar To clarify, are you talking about |
|
Hi, I would appreciate if there would be at least This inconsistency between Ideally I'd stick to Andrej's suggestions to strictly disable every "convenience" shortcuts that HF tokenizers can do. In this case |
|
This pull request has merge conflicts that must be resolved before it can be |
|
This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you! |
|
This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you! |
This PR updates chat template examples to align with the latest (>=0.4.3)
/v1/chat/completionsspec on BOS.Some chat templates don't contain
bos_token, especially when it's written before v0.4.3. According to the excellent investigation #9519, the chat serving does not add BOS again, while awkwardness remains in the offline chat interface (to be fixed).