feat: Add support for LLGuidance #5011

jellysnack · 2025-06-08T23:04:56Z

PR title

Add Initial Support for LLGuidance Library (Working Draft)

Description

This PR introduces preliminary support for the LLGuidance library. I've already been using it in my fork and found it useful, so I'd like to contribute this functionality.

The current implementation is working, but the interface is still a draft and may require further refinement. I'm also aware that the current interface lacks flexibility for configuring the XGrammar or LLGuidance backends, which can be important in practice. For example, in my case it was crucial to control the formatting of whitespace and separators, as well as to set a maximum cache size for XGrammar (which is unlimited by default).

I'm opening this PR as a work in progress and would really appreciate any feedback or guidance. I'm happy to iterate on the design and implementation.

Signed-off-by: jellysnack <[email protected]>

juney-nvidia · 2025-06-09T00:37:23Z

Thanks for the contributions, @jellysnack .

Let me add @syuoni to help review who is the main developer integrating XGrammar into TensorRT-LLM.
Also keeping @Naveassaf @dcampora for vis who are working on TensorRT-LLM Sampling/decoding related logics which relate to this topic.

Thanks
June

syuoni · 2025-06-10T11:35:13Z

Hi @jellysnack ,

Thanks for the contribution! Could you please give an example of this feature?

jellysnack · 2025-06-10T12:37:51Z

Hi @jellysnack ,

Thanks for the contribution! Could you please give an example of this feature?

Hi, thanks for your response!

The main reason I switched from XGrammar to LLGuidance was performance. For certain types of JSON Schemas, XGrammar can take several seconds to compile, while LLGuidance is significantly faster. Additionally, LLGuidance has better coverage of the JSON Schema specification, supporting more complex constructs.

Another key difference is that LLGuidance provides clear and informative error messages when a grammar fails to compile, whereas XGrammar may silently ignore parts of the schema if they're unsupported, so I don't get an error and LLM fails to follow the schema.

You can find performance comparisons and benchmarks demonstrating these differences in the LLGuidance repository. Let me know if you'd like me to include a specific example.

jellysnack · 2025-06-10T12:54:49Z

Sorry, I might have misunderstood your question earlier. If you're asking for an example of how to use LLGuidance, it's functionally similar to using XGrammar.

To switch to LLGuidance, you set the GuidedDecodingBackend to kLLGUIDANCE (instead of kXGRAMMAR) and provide a GuidedDecodingConfig in the same way as with XGrammar. The GuidedDecodingParams are also unchanged, except for the new GuideType: kLARK_GRAMMAR.

Since LLGuidance uses a variation of the Lark grammar format (rather than standard EBNF), if you're specifying a grammar manually, you’ll need to set GuideType to kLARK_GRAMMAR accordingly.

That said, I’ve only integrated the backend on the C++ side and haven’t yet extended the Python bindings, so I’m unable to provide a ready-to-run Python example at the moment. Preparing a working C++ usage example may take a little more time, but I'm happy to provide one if needed.

syuoni · 2025-06-10T13:02:51Z

Sorry, I might have misunderstood your question earlier. If you're asking for an example of how to use LLGuidance, it's functionally similar to using XGrammar.

To switch to LLGuidance, you set the GuidedDecodingBackend to kLLGUIDANCE (instead of kXGRAMMAR) and provide a GuidedDecodingConfig in the same way as with XGrammar. The GuidedDecodingParams are also unchanged, except for the new GuideType: kLARK_GRAMMAR.

Since LLGuidance uses a variation of the Lark grammar format (rather than standard EBNF), if you're specifying a grammar manually, you’ll need to set GuideType to kLARK_GRAMMAR accordingly.

That said, I’ve only integrated the backend on the C++ side and haven’t yet extended the Python bindings, so I’m unable to provide a ready-to-run Python example at the moment. Preparing a working C++ usage example may take a little more time, but I'm happy to provide one if needed.

Yes, we would like to see a Python example with LLM API like this one. So, it would be great if you can also finish the pybindings part.

A LLM API example also makes it easy to add an integration test so that we can protect this feature. Thanks!

Signed-off-by: jellysnack <[email protected]>

…RT-LLM into feature/add-llguidance Signed-off-by: jellysnack <[email protected]>

jellysnack · 2025-06-11T14:49:23Z

Hi @syuoni

I've updated the PR with the following improvements:

Finished the Python bindings for LLGuidance.
Modified llm_guided_decoding.py example to support llguidance backend

Additionally, I'm open to extending LLGuidance support to the torch backend as well. Let me know if this is something you'd be interested in

syuoni · 2025-06-11T16:07:03Z

Hi @syuoni

I've updated the PR with the following improvements:

Finished the Python bindings for LLGuidance.

Modified llm_guided_decoding.py example to support llguidance backend

This is great! I'll take a closer look at this PR tomorrow. Do you mind if I add a test to this PR?

Additionally, I'm open to extending LLGuidance support to the torch backend as well. Let me know if this is something you'd be interested in

Absolutely! We are currently promoting the torch backend. Really appreciate if you can also support LLGuidance in torch backend.

syuoni · 2025-06-11T16:10:21Z

To be clear, this PR can focus on TensorRT backend as the current status. The support in torch backend can be added in a new PR. Thanks!

jellysnack · 2025-06-11T18:47:38Z

Do you mind if I add a test to this PR?

Not at all, feel free to add a test or any other changes to the PR

syuoni · 2025-06-13T07:06:50Z

Hi @jellysnack ,

I've locally verified that the newly added LLGuidance backend works perfectly. Personally, I like this PR. Again, thanks for your contribution!

Currently, we are having some internal discussion on whether to accept this PR, since some developers raise concerns because:

This PR introduces a new git submodule LLGuidance.
LLGuidance requires cargo to compile, which is not available in the current TRT-LLM dev environment.
This PR supports new feature in TensorRT flow, while TRT-LLM has shifted the priority to the PyTorch flow.

The discussion is still on-going, I will let you know once we have a decision.

To confirm, If LLGuidance is integrated to the PyTorch flow, the introduced dependency is the LLGuidance Python package only, right? If so, that will be much easier.

jellysnack · 2025-06-13T11:21:35Z

To confirm, If LLGuidance is integrated to the PyTorch flow, the introduced dependency is the LLGuidance Python package only, right? If so, that will be much easier.

Yes, just the Python package

jellysnack · 2025-06-14T22:35:15Z

Hi @syuoni ,

While the discussion on this PR is ongoing, I’ve created a second PR that adds LLGuidance support for the PyTorch backend: #5214

jellysnack and others added 3 commits June 8, 2025 22:47

Add support for LLGuidance

5ac3efb

Signed-off-by: jellysnack <[email protected]>

fix kLARK_GRAMMAR number

ba51c70

Signed-off-by: jellysnack <[email protected]>

Merge branch 'main' into feature/add-llguidance

2e2327b

Signed-off-by: jellysnack <[email protected]>

juney-nvidia requested review from dcampora, syuoni and Naveassaf June 9, 2025 00:36

juney-nvidia added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 9, 2025

juney-nvidia changed the title ~~Add support for LLGuidance~~ feat: Add support for LLGuidance Jun 9, 2025

jellysnack and others added 3 commits June 11, 2025 14:22

some fixes + python example

77f757b

Signed-off-by: jellysnack <[email protected]>

Merge branch 'feature/add-llguidance' of github.com:jellysnack/Tensor…

dcaebe1

…RT-LLM into feature/add-llguidance Signed-off-by: jellysnack <[email protected]>

Merge branch 'main' into feature/add-llguidance

fa43800

jellysnack mentioned this pull request Jun 14, 2025

feat: Add LLGuidance Support for PyTorch Backend #5214

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add support for LLGuidance #5011

feat: Add support for LLGuidance #5011

Uh oh!

jellysnack commented Jun 8, 2025

Uh oh!

juney-nvidia commented Jun 9, 2025

Uh oh!

syuoni commented Jun 10, 2025

Uh oh!

jellysnack commented Jun 10, 2025 •

edited

Loading

Uh oh!

jellysnack commented Jun 10, 2025

Uh oh!

syuoni commented Jun 10, 2025

Uh oh!

jellysnack commented Jun 11, 2025

Uh oh!

syuoni commented Jun 11, 2025

Uh oh!

syuoni commented Jun 11, 2025

Uh oh!

jellysnack commented Jun 11, 2025

Uh oh!

syuoni commented Jun 13, 2025

Uh oh!

jellysnack commented Jun 13, 2025

Uh oh!

jellysnack commented Jun 14, 2025

Uh oh!

Uh oh!

feat: Add support for LLGuidance #5011

Are you sure you want to change the base?

feat: Add support for LLGuidance #5011

Uh oh!

Conversation

jellysnack commented Jun 8, 2025

PR title

Description

Uh oh!

juney-nvidia commented Jun 9, 2025

Uh oh!

syuoni commented Jun 10, 2025

Uh oh!

jellysnack commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jellysnack commented Jun 10, 2025

Uh oh!

syuoni commented Jun 10, 2025

Uh oh!

jellysnack commented Jun 11, 2025

Uh oh!

syuoni commented Jun 11, 2025

Uh oh!

syuoni commented Jun 11, 2025

Uh oh!

jellysnack commented Jun 11, 2025

Uh oh!

syuoni commented Jun 13, 2025

Uh oh!

jellysnack commented Jun 13, 2025

Uh oh!

jellysnack commented Jun 14, 2025

Uh oh!

Uh oh!

jellysnack commented Jun 10, 2025 •

edited

Loading