-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[V1] Set structured output backend to auto
by default
#15724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[V1] Set structured output backend to auto
by default
#15724
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request updates the default backend for guided decoding by conditionally setting it based on the environment variable VLLM_USE_V1. It aims to switch from a fixed default of "xgrammar" to using "auto" by default for V1, while still supporting "xgrammar" for V0.
- Conditional default value for the guided decoding backend is introduced.
- Help message updated to reflect the conditional default behavior.
Comments suppressed due to low confidence (2)
vllm/engine/arg_utils.py:394
- The conditional default logic relies on envs.VLLM_USE_V1 being exactly '0' to select 'xgrammar'. Please verify that this condition covers all expected runtime scenarios and that the environment variable is consistently defined.
default="xgrammar" if envs.VLLM_USE_V1 == "0" else "auto",
vllm/engine/arg_utils.py:403
- [nitpick] Consider clarifying the help message by explicitly mentioning that the default backend depends on the value of envs.VLLM_USE_V1, which may help reduce confusion for users.
'The default is auto for V1 and xgrammar for V0.')
d33b337
to
935b870
Compare
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would only change the default behaviour if vLLM is started via CLI. The default for LLM
would still be xgrammar
.
This is a bit of a general gotcha with the way we define defaults. They can either be defined in:
SomethingConfig
- as a default in the dataclassEngineArgs
- as a default in this dataclassEngineArgs.add_cli_args
- as a default value for anargparse
argument
It's not something for this PR, but we should probably decide where all defaults should live and do a little refactor.
@hmellor Thank you! Great catch. 100% agree with the need to refactor this ... very confusing |
fc4865a
to
411dfe9
Compare
vllm/config.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and instead of the comment you could type hint it with Literal["auto", "outlines", "lm-format-enforcer", "xgrammar"]
. It makes the line longer, but it's very IDE friendly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky because the valid set of values is different between V0 and V1.
I've made #16332 which begins to address our config code. It only covers |
The previous default was `xgrammar` and users had to opt-in to fallback behavior. After more thought, `auto` seems like a better default as it lets us do our best to satisfy all requests. Users can still pin vllm to a single backend if desired. Make `auto` work for V0 in case it gets specified there, as well. Signed-off-by: Russell Bryant <[email protected]>
c048682
to
901d705
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
These tests now run against V1 in CI, but the code was originally written with V0 in mind. In particular, specifying the backend through the OpenAI API is not supported with V1, so we can remove it and speed up the tests quite a bit. Different backends are tested in another place (via the llm entrypoint tests). Signed-off-by: Russell Bryant <[email protected]>
Hey, this broke my offline inference scripts with structured outputs which worked perfectly in the last release 0.8.3. I tried to switch the backend to 'auto', but I still get the same error. How should one adapt their script for this change? Cheers.
|
That error indicates that the engine wants to use Can you try not setting the If this doesn't work, please provide a minimal script so that I can try and reproduce the behaviour. |
I think the problem here is that |
The error @alesalloum sees does not come from this PR, it was added in #14694. In that PR the decision was made that users cannot use arbitrary backends at runtime. At startup they either leave it as |
Oh I see. Thanks for the explanation. I think we should have a clearer error message to notify users that they are not supposed to set the backend, and add a deprecation warning for those who are currently setting the backend (even if it's the same as the one on startup). |
Ok, I can make a PR for that. Should we lower the raise to a warn-and-ignore with more explanation? |
Here's the PR #16717 |
…#15724) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Yang Wang <[email protected]>
…#15724) Signed-off-by: Russell Bryant <[email protected]>
…#15724) Signed-off-by: Russell Bryant <[email protected]>
…#15724) Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Mu Huai <[email protected]>
The previous default was
xgrammar
and users had to opt-in to fallbackbehavior. After more thought,
auto
seems like a better default as itlets us do our best to satisfy all requests. Users can still pin vllm to
a single backend if desired.
Signed-off-by: Russell Bryant [email protected]