Skip to content

Conversation

jellysnack
Copy link
Contributor

PR title

Add LLGuidance Support for PyTorch Backend

Description

This PR introduces LLGuidance as a guided decoding backend for the PyTorch flow.
It complements the functionality proposed in #5011 (adds LLGuidance for the TensorRT flow), but avoids the Cargo dependency by relying solely on the LLGuidance Python package.

This makes the integration more lightweight and aligns better with the current project direction focused on the PyTorch flow.

@jellysnack jellysnack requested a review from a team as a code owner June 14, 2025 22:27
@jellysnack jellysnack requested a review from Naveassaf June 14, 2025 22:27
@syuoni
Copy link
Collaborator

syuoni commented Jun 16, 2025

/bot run

Copy link
Collaborator

@syuoni syuoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution! @jellysnack

@syuoni syuoni added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 16, 2025
@syuoni syuoni requested a review from QiJune June 16, 2025 02:29
@syuoni
Copy link
Collaborator

syuoni commented Jun 16, 2025

@QiJune Could you please also take a look at this PR?

I've locally verified the new guided decoding backend works well. Corresponding tests will be added in a new PR. Thanks!

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8957 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #8957 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6536 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@syuoni syuoni changed the title Add LLGuidance Support for PyTorch Backend feat: Add LLGuidance Support for PyTorch Backend Jun 16, 2025
Copy link
Collaborator

@syuoni syuoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jellysnack ,

I tried to add tests for llguidance, but encountered an error. Seems llguidance cannot parse all JSON schema in the json_mode_eval dataset. Could you please run the below command and check the results?

cat > ./extra_llm_api_options.yaml <<EOF
guided_decoding_backend: llguidance
disable_overlap_scheduler: true
EOF

trtllm-eval --model <LLaMA-3.1-Instruct-Path> --extra_llm_api_options extra_llm_api_options.yaml json_mode_eval

The error log complains:

ValueError: LLGuidance matcher error: Unimplemented keys: ["else", "if", "then"]

Signed-off-by: Enwei Zhu <[email protected]>
@jellysnack
Copy link
Contributor Author

jellysnack commented Jun 17, 2025

Hi @jellysnack ,

I tried to add tests for llguidance, but encountered an error. Seems llguidance cannot parse all JSON schema in the json_mode_eval dataset. Could you please run the below command and check the results?

cat > ./extra_llm_api_options.yaml <<EOF
guided_decoding_backend: llguidance
disable_overlap_scheduler: true
EOF

trtllm-eval --model <LLaMA-3.1-Instruct-Path> --extra_llm_api_options extra_llm_api_options.yaml json_mode_eval

The error log complains:

ValueError: LLGuidance matcher error: Unimplemented keys: ["else", "if", "then"]

I think I know what's the issue. By default llguidance fails when it encounters unsupported keys in the schema. You can control this behavior using the top-level "x-guidance" key in the JSON schema, along with settings for various options such as whitespace handling (see the x-guidance documentation).

Setting "lenient": true instructs llguidance to ignore unsupported keywords and formats instead of failing.

So JSON schema should include the "x-guidance" key as follows:

{
   "x-guidance": {
      "lenient": true
   },
   "type": "object",
   "properties": {
      ...
   }
}

I haven’t tried running the test with this fix yet. I’ll add "x-guidance": { "lenient": true } by default to JSON schemas in case it resolves the problem. If it’s convenient for you, please feel free to test the fix on your end as well.

But it would be great to have these options including whitespace controls exposed as easily configurable parameters, rather than hardcoded. Additionally, it would be extremely useful to also allow configuration of XGrammar parameters such as cache size. Currently, the XGrammar cache is infinite by default, which can lead to memory overflow (I’ve encountered this myself and needed to patch TRT-LLM to avoid OOM)

@syuoni
Copy link
Collaborator

syuoni commented Jun 17, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9186 [ run ] triggered by Bot

@syuoni syuoni enabled auto-merge (squash) June 17, 2025 15:06
@syuoni
Copy link
Collaborator

syuoni commented Jun 18, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9257 [ run ] triggered by Bot

@syuoni
Copy link
Collaborator

syuoni commented Jun 18, 2025

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9267 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9257 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9267 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit b4f8c38

@syuoni
Copy link
Collaborator

syuoni commented Jun 18, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9268 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #9268 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6799 completed with status: 'SUCCESS'

@syuoni syuoni merged commit 0623ffe into NVIDIA:main Jun 18, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Engagement help/insights needed from community Community want to contribute PRs initiated from Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants