-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Add LLGuidance Support for PyTorch Backend #5214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jellysnack <[email protected]>
Signed-off-by: jellysnack <[email protected]>
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the contribution! @jellysnack
@QiJune Could you please also take a look at this PR? I've locally verified the new guided decoding backend works well. Corresponding tests will be added in a new PR. Thanks! |
PR_Github #8957 [ run ] triggered by Bot |
PR_Github #8957 [ run ] completed with state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jellysnack ,
I tried to add tests for llguidance, but encountered an error. Seems llguidance cannot parse all JSON schema in the json_mode_eval dataset. Could you please run the below command and check the results?
cat > ./extra_llm_api_options.yaml <<EOF
guided_decoding_backend: llguidance
disable_overlap_scheduler: true
EOF
trtllm-eval --model <LLaMA-3.1-Instruct-Path> --extra_llm_api_options extra_llm_api_options.yaml json_mode_eval
The error log complains:
ValueError: LLGuidance matcher error: Unimplemented keys: ["else", "if", "then"]
Signed-off-by: Enwei Zhu <[email protected]>
I think I know what's the issue. By default llguidance fails when it encounters unsupported keys in the schema. You can control this behavior using the top-level Setting So JSON schema should include the {
"x-guidance": {
"lenient": true
},
"type": "object",
"properties": {
...
}
} I haven’t tried running the test with this fix yet. I’ll add "x-guidance": { "lenient": true } by default to JSON schemas in case it resolves the problem. If it’s convenient for you, please feel free to test the fix on your end as well. But it would be great to have these options including whitespace controls exposed as easily configurable parameters, rather than hardcoded. Additionally, it would be extremely useful to also allow configuration of XGrammar parameters such as cache size. Currently, the XGrammar cache is infinite by default, which can lead to memory overflow (I’ve encountered this myself and needed to patch TRT-LLM to avoid OOM) |
/bot run |
PR_Github #9186 [ run ] triggered by Bot |
/bot run |
PR_Github #9257 [ run ] triggered by Bot |
/bot kill |
PR_Github #9267 [ kill ] triggered by Bot |
PR_Github #9257 [ run ] completed with state |
PR_Github #9267 [ kill ] completed with state |
/bot run |
PR_Github #9268 [ run ] triggered by Bot |
PR_Github #9268 [ run ] completed with state |
PR title
Add LLGuidance Support for PyTorch Backend
Description
This PR introduces LLGuidance as a guided decoding backend for the PyTorch flow.
It complements the functionality proposed in #5011 (adds LLGuidance for the TensorRT flow), but avoids the Cargo dependency by relying solely on the LLGuidance Python package.
This makes the integration more lightweight and aligns better with the current project direction focused on the PyTorch flow.