-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Feature] Support user-specified "trigger" token before starting structured decoding #12995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support user-specified "trigger" token before starting structured decoding #12995
Conversation
…xgrammar. Signed-off-by: jacobthebanana <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Output from unit test: $ python \
-m pytest \
--maxfail=1 \
--disable-warnings \
-sv tests/entrypoints/llm/test_guided_generate.py::test_guided_json_for_reasoning(new lines added for readability) |
…ading r1-distill alongside qwen-instruct. Signed-off-by: jacobthebanana <[email protected]>
…type signature. Signed-off-by: jacobthebanana <[email protected]>
…type signature. Signed-off-by: jacobthebanana <[email protected]>
…from R1-distill-1.5B Signed-off-by: jacobthebanana <[email protected]>
|
Closing this PR in favor of #12955 |
This PR allows the user to specify a "trigger token" that needs to be produced before xgrammar is applied to structured decoding. For example, when generating with r1-like models, the end-of-thought token
</think>can be the trigger token, as seen in the example in the added unit test.Additional work might be required to:
JSON Output:or\boxedin math prompts) as the trigger for structured decoding.FIX #12619
I was not aware of #12955 from Saturday morning before I started working on this PR on Sunday- I apologize to @gaocegege if this PR partially overlapped with their contribution. From what I understand, the main difference between these two PR is the handling of
batch_sizeinxgrammar_decoding, in case more than one stream of generations are being sent through this logic processor at a time. Though it is unclear whether that would ever be the case in the current setup.