-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: Add support for LLGuidance #5011
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: jellysnack <[email protected]>
Signed-off-by: jellysnack <[email protected]>
Signed-off-by: jellysnack <[email protected]>
Thanks for the contributions, @jellysnack . Let me add @syuoni to help review who is the main developer integrating XGrammar into TensorRT-LLM. Thanks |
Hi @jellysnack , Thanks for the contribution! Could you please give an example of this feature? |
Hi, thanks for your response! The main reason I switched from XGrammar to LLGuidance was performance. For certain types of JSON Schemas, XGrammar can take several seconds to compile, while LLGuidance is significantly faster. Additionally, LLGuidance has better coverage of the JSON Schema specification, supporting more complex constructs. Another key difference is that LLGuidance provides clear and informative error messages when a grammar fails to compile, whereas XGrammar may silently ignore parts of the schema if they're unsupported, so I don't get an error and LLM fails to follow the schema. You can find performance comparisons and benchmarks demonstrating these differences in the LLGuidance repository. Let me know if you'd like me to include a specific example. |
Sorry, I might have misunderstood your question earlier. If you're asking for an example of how to use LLGuidance, it's functionally similar to using XGrammar. To switch to LLGuidance, you set the GuidedDecodingBackend to kLLGUIDANCE (instead of kXGRAMMAR) and provide a GuidedDecodingConfig in the same way as with XGrammar. The GuidedDecodingParams are also unchanged, except for the new GuideType: kLARK_GRAMMAR. Since LLGuidance uses a variation of the Lark grammar format (rather than standard EBNF), if you're specifying a grammar manually, you’ll need to set GuideType to kLARK_GRAMMAR accordingly. That said, I’ve only integrated the backend on the C++ side and haven’t yet extended the Python bindings, so I’m unable to provide a ready-to-run Python example at the moment. Preparing a working C++ usage example may take a little more time, but I'm happy to provide one if needed. |
Yes, we would like to see a Python example with LLM API like this one. So, it would be great if you can also finish the pybindings part. A LLM API example also makes it easy to add an integration test so that we can protect this feature. Thanks! |
Signed-off-by: jellysnack <[email protected]>
…RT-LLM into feature/add-llguidance Signed-off-by: jellysnack <[email protected]>
Hi @syuoni I've updated the PR with the following improvements:
Additionally, I'm open to extending LLGuidance support to the torch backend as well. Let me know if this is something you'd be interested in |
This is great! I'll take a closer look at this PR tomorrow. Do you mind if I add a test to this PR?
Absolutely! We are currently promoting the torch backend. Really appreciate if you can also support LLGuidance in torch backend. |
To be clear, this PR can focus on TensorRT backend as the current status. The support in torch backend can be added in a new PR. Thanks! |
Not at all, feel free to add a test or any other changes to the PR |
Hi @jellysnack , I've locally verified that the newly added LLGuidance backend works perfectly. Personally, I like this PR. Again, thanks for your contribution! Currently, we are having some internal discussion on whether to accept this PR, since some developers raise concerns because:
The discussion is still on-going, I will let you know once we have a decision. To confirm, If LLGuidance is integrated to the PyTorch flow, the introduced dependency is the LLGuidance Python package only, right? If so, that will be much easier. |
Yes, just the Python package |
PR title
Add Initial Support for LLGuidance Library (Working Draft)
Description
This PR introduces preliminary support for the LLGuidance library. I've already been using it in my fork and found it useful, so I'd like to contribute this functionality.
The current implementation is working, but the interface is still a draft and may require further refinement. I'm also aware that the current interface lacks flexibility for configuring the XGrammar or LLGuidance backends, which can be important in practice. For example, in my case it was crucial to control the formatting of whitespace and separators, as well as to set a maximum cache size for XGrammar (which is unlimited by default).
I'm opening this PR as a work in progress and would really appreciate any feedback or guidance. I'm happy to iterate on the design and implementation.