-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[nvbugs/5361178] feat: json_schema support in trtllm-serve using xgrammar #6197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
adding the changes to support the json_schema as one of the supported type Signed-off-by: mayani-nv <[email protected]>
adding flags related to the lora_request else it will give 400 request code Signed-off-by: mayani-nv <[email protected]>
""" WalkthroughSupport for a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server
participant Model
Client->>Server: Send chat completion request with response_format={"type": "json", "schema": {...}}
Server->>Server: Validate presence of schema for "json" type
Server->>Model: Generate response with guided decoding (using schema)
Model-->>Server: Return JSON response
Server-->>Client: Return structured JSON response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15–20 minutes Suggested reviewers
Poem
Note ⚡️ Unit Test Generation - BetaCodeRabbit's unit test generation is now available in Beta! Automatically generate comprehensive unit tests for your code changes, ensuring better test coverage and catching edge cases you might miss. Our AI analyzes your code structure and creates tests that follow best practices and your project's testing patterns. Learn more here, or just try it under ✨ Finishing Touches. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/serve/openai_protocol.py
(3 hunks)tensorrt_llm/serve/openai_server.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
tensorrt_llm/serve/openai_server.py
294-294: Local variable lora_request
is assigned to but never used
Remove assignment to unused variable lora_request
(F841)
🔇 Additional comments (3)
tensorrt_llm/serve/openai_protocol.py (3)
55-59
: LGTM! Clean extension of ResponseFormat for JSON schema support.The implementation correctly extends the existing response format types to include
"json_schema"
and adds the appropriate optional field to hold the schema data. The type annotations and field definitions follow the established patterns.
148-149
: LGTM! Correct implementation of JSON schema guided decoding.The conversion logic properly handles the new
"json_schema"
type by passing the schema data toGuidedDecodingParams(json=response_format.json_schema)
. This follows the expected pattern and integrates well with the existing guided decoding framework.
211-211
: LGTM! Documentation accurately reflects the new capability.The description correctly includes
{'type': 'json_schema'}
alongside the existing supported format types, maintaining consistency with the documentation style.
fixing typo with `lora_request` Signed-off-by: mayani-nv <[email protected]>
@mayani-nv #6000 added overlap scheduler support for guided decoding. Would that be sufficient to run |
Fixing the `lora_request` typo Signed-off-by: mayani-nv <[email protected]>
PR description has both json_schema and structural tag example. Since this PR only adds json_schema, can we remove the structural_tag example to keep the description concise? @mayani-nv |
removing the `lora_request` as the latest main branch contains this defined. Signed-off-by: mayani-nv <[email protected]>
Adding the unit test for the json_schema support in xgrammar Signed-off-by: mayani-nv <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unittest/llmapi/apps/_test_openai_json_schema.py (2)
15-17
: Fix inconsistent fixture ID.The fixture ID "TinyLlama-1.1B-Chat" doesn't match the actual model name "llama-3.1-model/Llama-3.1-8B-Instruct". This inconsistency could be confusing.
-@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) +@pytest.fixture(scope="module", ids=["Llama-3.1-8B-Instruct"])
57-67
: Consider more inclusive regex pattern for city names.The regex pattern
^\w+$
for the capital name field might be too restrictive, as it only allows word characters and would reject valid city names with spaces, hyphens, or apostrophes (e.g., "New York", "Saint-Denis").Consider a more inclusive pattern:
- name: str = Field(..., - pattern=r"^\w+$", - description="The name of the capital city") + name: str = Field(..., + pattern=r"^[\w\s\-'\.]+$", + description="The name of the capital city")However, since this test specifically asks for Paris (which matches the current pattern), the current implementation works for this test case.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/unittest/llmapi/apps/_test_openai_json_schema.py
(1 hunks)
🔇 Additional comments (5)
tests/unittest/llmapi/apps/_test_openai_json_schema.py (5)
1-12
: LGTM! Well-organized imports and appropriate test configuration.The imports are logically grouped and the thread leak detection disable is appropriate for OpenAI server integration tests.
20-34
: LGTM! Proper resource management and configuration.The fixture correctly creates a temporary configuration file with appropriate cleanup in the finally block. The xgrammar backend configuration aligns with the JSON schema support requirements.
36-44
: LGTM! Clean server setup with proper resource management.The server fixture correctly configures the RemoteOpenAIServer with the necessary backend and options.
47-54
: LGTM! Clean client fixture setup.Both synchronous and asynchronous client fixtures are properly configured with appropriate scope.
70-103
: LGTM! Comprehensive test with good validation.The test function thoroughly validates both the OpenAI API integration and JSON schema compliance. The assertions cover response structure, content parsing, and expected values.
Consider setting
temperature=0.0
for more deterministic results in unit tests, though the current configuration should work fine for the specific prompt used.
The main commiit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tensorrt_llm/serve/openai_protocol.py
(3 hunks)tests/integration/defs/test_e2e.py
(1 hunks)tests/integration/test_lists/test-db/l0_a10.yml
(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat_json.py
(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- tests/integration/test_lists/test-db/l0_a10.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- tensorrt_llm/serve/openai_protocol.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx,cu,py}
: Use only spaces for indentation. Do not use tabs. Indent 4 spaces at a time.
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/integration/defs/test_e2e.py
tests/unittest/llmapi/apps/_test_openai_chat_json.py
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py
: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/integration/defs/test_e2e.py
tests/unittest/llmapi/apps/_test_openai_chat_json.py
🧬 Code Graph Analysis (1)
tests/unittest/llmapi/apps/_test_openai_chat_json.py (1)
tests/integration/defs/test_e2e.py (1)
temp_extra_llm_api_options_file
(670-705)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/_test_openai_chat_json.py
88-88: Undefined name Any
(F821)
103-103: Undefined name json
(F821)
104-104: Undefined name json
(F821)
106-106: Undefined name output_text
(F821)
109-109: Undefined name jsonschema
(F821)
131-131: Undefined name first_message
(F821)
🔇 Additional comments (2)
tests/integration/defs/test_e2e.py (1)
1446-1452
: LGTM! The integration test follows established patterns.The new test function correctly follows the same pattern as other OpenAI test functions in this file and properly integrates the JSON schema test.
tests/unittest/llmapi/apps/_test_openai_chat_json.py (1)
16-79
: Well-structured test fixtures with proper setup and cleanup.The fixtures are correctly implemented with:
- Appropriate scoping (
module
scope for shared resources)- Proper cleanup in the temporary file fixture
- Correct server configuration for JSON schema testing (xgrammar backend, disabled overlap scheduler)
- Good separation of concerns between different fixture types
import os | ||
import tempfile | ||
|
||
import openai | ||
import pytest | ||
import yaml | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix missing imports that cause undefined name errors.
The static analysis correctly identifies several missing imports that are used later in the code.
Add the missing imports:
import os
import tempfile
+import json
+from typing import Any
import openai
import pytest
import yaml
+import jsonschema
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import os | |
import tempfile | |
import openai | |
import pytest | |
import yaml | |
import os | |
import tempfile | |
import json | |
from typing import Any | |
import openai | |
import pytest | |
import yaml | |
import jsonschema |
🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py around lines 3 to 9,
there are missing imports causing undefined name errors. Review the code to
identify all used but not imported modules or functions, then add the necessary
import statements at the top of the file to resolve these errors.
def _create_and_validate_response( | ||
messages: list[dict[str, Any]]) -> dict[str, any]: | ||
chat_completion = client.chat.completions.create( | ||
model=model_name, | ||
messages=messages, | ||
max_tokens=1000, | ||
temperature=0.0, | ||
response_format={ | ||
"type": "json", | ||
"schema": user_profile_schema | ||
}, | ||
) | ||
message = chat_completion.choices[0].message | ||
assert message.content is not None | ||
|
||
try: | ||
message_json = json.loads(message.content) | ||
except json.JSONDecodeError: | ||
pytest.fail( | ||
f"The output was not a valid JSON string. Output: {output_text}" | ||
) | ||
|
||
jsonschema.validate(instance=message_json, schema=user_profile_schema) | ||
return message_json | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variables in the helper function.
The helper function has several undefined variable references that will cause runtime errors.
Apply these fixes:
def _create_and_validate_response(
- messages: list[dict[str, Any]]) -> dict[str, any]:
+ messages: list[dict[str, Any]]) -> dict[str, Any]:
chat_completion = client.chat.completions.create(
model=model_name,
messages=messages,
max_tokens=1000,
temperature=0.0,
response_format={
"type": "json",
"schema": user_profile_schema
},
)
message = chat_completion.choices[0].message
assert message.content is not None
try:
message_json = json.loads(message.content)
except json.JSONDecodeError:
pytest.fail(
- f"The output was not a valid JSON string. Output: {output_text}"
+ f"The output was not a valid JSON string. Output: {message.content}"
)
jsonschema.validate(instance=message_json, schema=user_profile_schema)
return message_json
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def _create_and_validate_response( | |
messages: list[dict[str, Any]]) -> dict[str, any]: | |
chat_completion = client.chat.completions.create( | |
model=model_name, | |
messages=messages, | |
max_tokens=1000, | |
temperature=0.0, | |
response_format={ | |
"type": "json", | |
"schema": user_profile_schema | |
}, | |
) | |
message = chat_completion.choices[0].message | |
assert message.content is not None | |
try: | |
message_json = json.loads(message.content) | |
except json.JSONDecodeError: | |
pytest.fail( | |
f"The output was not a valid JSON string. Output: {output_text}" | |
) | |
jsonschema.validate(instance=message_json, schema=user_profile_schema) | |
return message_json | |
def _create_and_validate_response( | |
messages: list[dict[str, Any]]) -> dict[str, Any]: | |
chat_completion = client.chat.completions.create( | |
model=model_name, | |
messages=messages, | |
max_tokens=1000, | |
temperature=0.0, | |
response_format={ | |
"type": "json", | |
"schema": user_profile_schema | |
}, | |
) | |
message = chat_completion.choices[0].message | |
assert message.content is not None | |
try: | |
message_json = json.loads(message.content) | |
except json.JSONDecodeError: | |
pytest.fail( | |
f"The output was not a valid JSON string. Output: {message.content}" | |
) | |
jsonschema.validate(instance=message_json, schema=user_profile_schema) | |
return message_json |
🧰 Tools
🪛 Ruff (0.12.2)
88-88: Undefined name Any
(F821)
103-103: Undefined name json
(F821)
104-104: Undefined name json
(F821)
106-106: Undefined name output_text
(F821)
109-109: Undefined name jsonschema
(F821)
🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py between lines 87 and
111, the helper function _create_and_validate_response references undefined
variables such as client, model_name, user_profile_schema, json, pytest,
jsonschema, and output_text. To fix this, ensure all these variables and modules
are properly imported or defined in the file. Replace output_text with
message.content in the pytest.fail call to correctly reference the invalid JSON
string. Verify that client and model_name are initialized before this function
is called, and import json, pytest, and jsonschema modules at the top of the
file.
Moved to #6321 |
Description
The PR will help leverage
json_schema
support intrtllm-serve
. Currently, in order to run this successfuly the sequence of steps is as followswhere the
extra-llm-api-config.yaml
needs to contain theguided_decoding_backend
. Secondly, thedisable_overlap_scheduler
needs to beTrue
in order for this to workThen running the following request
gives the output as
name='Paris' population=2148271
. On the server side, you can see the logs can be seen as wellSummary by CodeRabbit