feat: Add LLGuidance Support for PyTorch Backend #5214

jellysnack · 2025-06-14T22:26:59Z

PR title

Add LLGuidance Support for PyTorch Backend

Description

This PR introduces LLGuidance as a guided decoding backend for the PyTorch flow.
It complements the functionality proposed in #5011 (adds LLGuidance for the TensorRT flow), but avoids the Cargo dependency by relying solely on the LLGuidance Python package.

This makes the integration more lightweight and aligns better with the current project direction focused on the PyTorch flow.

Signed-off-by: jellysnack <[email protected]>

syuoni · 2025-06-16T02:28:09Z

/bot run

syuoni

LGTM, thanks for the contribution! @jellysnack

syuoni · 2025-06-16T02:33:13Z

@QiJune Could you please also take a look at this PR?

I've locally verified the new guided decoding backend works well. Corresponding tests will be added in a new PR. Thanks!

tensorrt-cicd · 2025-06-16T02:33:46Z

PR_Github #8957 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-16T07:56:34Z

PR_Github #8957 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6536 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

syuoni

Hi @jellysnack ,

I tried to add tests for llguidance, but encountered an error. Seems llguidance cannot parse all JSON schema in the json_mode_eval dataset. Could you please run the below command and check the results?

cat > ./extra_llm_api_options.yaml <<EOF
guided_decoding_backend: llguidance
disable_overlap_scheduler: true
EOF

trtllm-eval --model <LLaMA-3.1-Instruct-Path> --extra_llm_api_options extra_llm_api_options.yaml json_mode_eval

The error log complains:

ValueError: LLGuidance matcher error: Unimplemented keys: ["else", "if", "then"]

tensorrt_llm/llmapi/llm.py

Signed-off-by: Enwei Zhu <[email protected]>

jellysnack · 2025-06-17T08:32:13Z

Hi @jellysnack ,

I tried to add tests for llguidance, but encountered an error. Seems llguidance cannot parse all JSON schema in the json_mode_eval dataset. Could you please run the below command and check the results?
cat > ./extra_llm_api_options.yaml <<EOF
guided_decoding_backend: llguidance
disable_overlap_scheduler: true
EOF

trtllm-eval --model <LLaMA-3.1-Instruct-Path> --extra_llm_api_options extra_llm_api_options.yaml json_mode_eval
The error log complains:
ValueError: LLGuidance matcher error: Unimplemented keys: ["else", "if", "then"]

I think I know what's the issue. By default llguidance fails when it encounters unsupported keys in the schema. You can control this behavior using the top-level "x-guidance" key in the JSON schema, along with settings for various options such as whitespace handling (see the x-guidance documentation).

Setting "lenient": true instructs llguidance to ignore unsupported keywords and formats instead of failing.

So JSON schema should include the "x-guidance" key as follows:

{
   "x-guidance": {
      "lenient": true
   },
   "type": "object",
   "properties": {
      ...
   }
}

I haven’t tried running the test with this fix yet. I’ll add "x-guidance": { "lenient": true } by default to JSON schemas in case it resolves the problem. If it’s convenient for you, please feel free to test the fix on your end as well.

But it would be great to have these options including whitespace controls exposed as easily configurable parameters, rather than hardcoded. Additionally, it would be extremely useful to also allow configuration of XGrammar parameters such as cache size. Currently, the XGrammar cache is infinite by default, which can lead to memory overflow (I’ve encountered this myself and needed to patch TRT-LLM to avoid OOM)

syuoni · 2025-06-17T09:42:40Z

/bot run

tensorrt-cicd · 2025-06-17T09:49:03Z

PR_Github #9186 [ run ] triggered by Bot

syuoni · 2025-06-18T00:14:43Z

/bot run

tensorrt-cicd · 2025-06-18T00:24:17Z

PR_Github #9257 [ run ] triggered by Bot

syuoni · 2025-06-18T01:28:16Z

/bot kill

tensorrt-cicd · 2025-06-18T01:34:09Z

PR_Github #9267 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-18T01:34:17Z

PR_Github #9257 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-18T01:34:40Z

PR_Github #9267 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit b4f8c38

syuoni · 2025-06-18T01:36:01Z

/bot run

tensorrt-cicd · 2025-06-18T01:42:44Z

PR_Github #9268 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-18T11:33:32Z

PR_Github #9268 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #6799 completed with status: 'SUCCESS'

llguidance for pytorch backend

96c86ea

Signed-off-by: jellysnack <[email protected]>

jellysnack requested a review from a team as a code owner June 14, 2025 22:27

jellysnack requested a review from Naveassaf June 14, 2025 22:27

Merge branch 'main' into feature/add-llguidance-torch

7c0308b

Signed-off-by: jellysnack <[email protected]>

jellysnack mentioned this pull request Jun 14, 2025

feat: Add support for LLGuidance #5011

Open

juney-nvidia requested a review from syuoni June 15, 2025 01:22

syuoni approved these changes Jun 16, 2025

View reviewed changes

syuoni added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 16, 2025

syuoni requested a review from QiJune June 16, 2025 02:29

syuoni changed the title ~~Add LLGuidance Support for PyTorch Backend~~ feat: Add LLGuidance Support for PyTorch Backend Jun 16, 2025

syuoni mentioned this pull request Jun 17, 2025

add feature support matrix for PyTorch backend #5037

Merged

Merge branch 'main' into feature/add-llguidance-torch

087d0fb

QiJune approved these changes Jun 17, 2025

View reviewed changes

syuoni reviewed Jun 17, 2025

View reviewed changes

tensorrt_llm/llmapi/llm.py Outdated Show resolved Hide resolved

fix rebase conflict

b4f8c38

Signed-off-by: Enwei Zhu <[email protected]>

syuoni enabled auto-merge (squash) June 17, 2025 15:06

syuoni merged commit 0623ffe into NVIDIA:main Jun 18, 2025
3 checks passed

feat: Add LLGuidance Support for PyTorch Backend #5214

feat: Add LLGuidance Support for PyTorch Backend #5214

Uh oh!

Conversation

jellysnack commented Jun 14, 2025

PR title

Description

Uh oh!

syuoni commented Jun 16, 2025

Uh oh!

syuoni left a comment

Choose a reason for hiding this comment

Uh oh!

syuoni commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

tensorrt-cicd commented Jun 16, 2025

Uh oh!

syuoni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jellysnack commented Jun 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

syuoni commented Jun 17, 2025

Uh oh!

tensorrt-cicd commented Jun 17, 2025

Uh oh!

syuoni commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

syuoni commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

syuoni commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

tensorrt-cicd commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

jellysnack commented Jun 17, 2025 •

edited

Loading