Skip to content

Conversation

jellysnack
Copy link
Contributor

PR title

Add Initial Support for LLGuidance Library (Working Draft)

Description

This PR introduces preliminary support for the LLGuidance library. I've already been using it in my fork and found it useful, so I'd like to contribute this functionality.

The current implementation is working, but the interface is still a draft and may require further refinement. I'm also aware that the current interface lacks flexibility for configuring the XGrammar or LLGuidance backends, which can be important in practice. For example, in my case it was crucial to control the formatting of whitespace and separators, as well as to set a maximum cache size for XGrammar (which is unlimited by default).

I'm opening this PR as a work in progress and would really appreciate any feedback or guidance. I'm happy to iterate on the design and implementation.

@juney-nvidia
Copy link
Collaborator

Thanks for the contributions, @jellysnack .

Let me add @syuoni to help review who is the main developer integrating XGrammar into TensorRT-LLM.
Also keeping @Naveassaf @dcampora for vis who are working on TensorRT-LLM Sampling/decoding related logics which relate to this topic.

Thanks
June

@juney-nvidia juney-nvidia added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 9, 2025
@juney-nvidia juney-nvidia changed the title Add support for LLGuidance feat: Add support for LLGuidance Jun 9, 2025
@syuoni
Copy link
Collaborator

syuoni commented Jun 10, 2025

Hi @jellysnack ,

Thanks for the contribution! Could you please give an example of this feature?

@jellysnack
Copy link
Contributor Author

jellysnack commented Jun 10, 2025

Hi @jellysnack ,

Thanks for the contribution! Could you please give an example of this feature?

Hi, thanks for your response!

The main reason I switched from XGrammar to LLGuidance was performance. For certain types of JSON Schemas, XGrammar can take several seconds to compile, while LLGuidance is significantly faster. Additionally, LLGuidance has better coverage of the JSON Schema specification, supporting more complex constructs.

Another key difference is that LLGuidance provides clear and informative error messages when a grammar fails to compile, whereas XGrammar may silently ignore parts of the schema if they're unsupported, so I don't get an error and LLM fails to follow the schema.

You can find performance comparisons and benchmarks demonstrating these differences in the LLGuidance repository. Let me know if you'd like me to include a specific example.

@jellysnack
Copy link
Contributor Author

Sorry, I might have misunderstood your question earlier. If you're asking for an example of how to use LLGuidance, it's functionally similar to using XGrammar.

To switch to LLGuidance, you set the GuidedDecodingBackend to kLLGUIDANCE (instead of kXGRAMMAR) and provide a GuidedDecodingConfig in the same way as with XGrammar. The GuidedDecodingParams are also unchanged, except for the new GuideType: kLARK_GRAMMAR.

Since LLGuidance uses a variation of the Lark grammar format (rather than standard EBNF), if you're specifying a grammar manually, you’ll need to set GuideType to kLARK_GRAMMAR accordingly.

That said, I’ve only integrated the backend on the C++ side and haven’t yet extended the Python bindings, so I’m unable to provide a ready-to-run Python example at the moment. Preparing a working C++ usage example may take a little more time, but I'm happy to provide one if needed.

@syuoni
Copy link
Collaborator

syuoni commented Jun 10, 2025

Sorry, I might have misunderstood your question earlier. If you're asking for an example of how to use LLGuidance, it's functionally similar to using XGrammar.

To switch to LLGuidance, you set the GuidedDecodingBackend to kLLGUIDANCE (instead of kXGRAMMAR) and provide a GuidedDecodingConfig in the same way as with XGrammar. The GuidedDecodingParams are also unchanged, except for the new GuideType: kLARK_GRAMMAR.

Since LLGuidance uses a variation of the Lark grammar format (rather than standard EBNF), if you're specifying a grammar manually, you’ll need to set GuideType to kLARK_GRAMMAR accordingly.

That said, I’ve only integrated the backend on the C++ side and haven’t yet extended the Python bindings, so I’m unable to provide a ready-to-run Python example at the moment. Preparing a working C++ usage example may take a little more time, but I'm happy to provide one if needed.

Yes, we would like to see a Python example with LLM API like this one. So, it would be great if you can also finish the pybindings part.

A LLM API example also makes it easy to add an integration test so that we can protect this feature. Thanks!

@jellysnack
Copy link
Contributor Author

Hi @syuoni

I've updated the PR with the following improvements:

  • Finished the Python bindings for LLGuidance.
  • Modified llm_guided_decoding.py example to support llguidance backend

Additionally, I'm open to extending LLGuidance support to the torch backend as well. Let me know if this is something you'd be interested in

@syuoni
Copy link
Collaborator

syuoni commented Jun 11, 2025

Hi @syuoni

I've updated the PR with the following improvements:

  • Finished the Python bindings for LLGuidance.
  • Modified llm_guided_decoding.py example to support llguidance backend

This is great! I'll take a closer look at this PR tomorrow. Do you mind if I add a test to this PR?

Additionally, I'm open to extending LLGuidance support to the torch backend as well. Let me know if this is something you'd be interested in

Absolutely! We are currently promoting the torch backend. Really appreciate if you can also support LLGuidance in torch backend.

@syuoni
Copy link
Collaborator

syuoni commented Jun 11, 2025

To be clear, this PR can focus on TensorRT backend as the current status. The support in torch backend can be added in a new PR. Thanks!

@jellysnack
Copy link
Contributor Author

Do you mind if I add a test to this PR?

Not at all, feel free to add a test or any other changes to the PR

@syuoni
Copy link
Collaborator

syuoni commented Jun 13, 2025

Hi @jellysnack ,

I've locally verified that the newly added LLGuidance backend works perfectly. Personally, I like this PR. Again, thanks for your contribution!

Currently, we are having some internal discussion on whether to accept this PR, since some developers raise concerns because:

  • This PR introduces a new git submodule LLGuidance.
  • LLGuidance requires cargo to compile, which is not available in the current TRT-LLM dev environment.
  • This PR supports new feature in TensorRT flow, while TRT-LLM has shifted the priority to the PyTorch flow.

The discussion is still on-going, I will let you know once we have a decision.


To confirm, If LLGuidance is integrated to the PyTorch flow, the introduced dependency is the LLGuidance Python package only, right? If so, that will be much easier.

@jellysnack
Copy link
Contributor Author

To confirm, If LLGuidance is integrated to the PyTorch flow, the introduced dependency is the LLGuidance Python package only, right? If so, that will be much easier.

Yes, just the Python package

@jellysnack
Copy link
Contributor Author

Hi @syuoni ,

While the discussion on this PR is ongoing, I’ve created a second PR that adds LLGuidance support for the PyTorch backend: #5214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Engagement help/insights needed from community Community want to contribute PRs initiated from Community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants