-
Couldn't load subscription status.
- Fork 1.8k
[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API #5957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locally verified the feature works. Thanks for the contribution! @noiji
It would be better if you can help to add the test as suggested by @nv-guomingz
|
@nv-guomingz @syuoni Thanks for your comments! I've made the requested changes :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e6f4cdf to
caa7f2a
Compare
|
/bot run |
Thanks,let's wait the CI pipeline results. |
|
PR_Github #11872 [ run ] triggered by Bot |
|
PR_Github #11872 [ run ] completed with state |
|
Hi, @noiji , the CI was failed due to format checking https://prod.blsm.nvidia.com/sw-tensorrt-top-1/blue/organizations/jenkins/LLM%2Fmain%2FL0_MergeRequest_PR/detail/L0_MergeRequest_PR/8799/pipeline/141. Please run |
0b7b143 to
d304509
Compare
|
/bot run |
|
/bot run |
|
PR_Github #11945 [ run ] triggered by Bot |
|
PR_Github #11945 [ run ] completed with state |
|
i am sorry but it seems like i don't have an access to ci/cd result (/LLM/main/L0_MergeRequest_PR pipeline #8863) would there be any way i could check it out? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI failed on this test and key error info was AssertionError: Please set max_seq_len to at least 8192 for kv cache manager, seems that we need to specify max_seq_len in the server args.
| @pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) | ||
| def model_name(): | ||
| return "llama-3.1-model/Llama-3.1-8B-Instruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pls change ids or model? They don't match. If tiny model is enough to demonstrate this feature, I would prefer the tiny model because the memory of A10 is kind of limited.
@nv-guomingz @syuoni Do you know how community contributors can access our CI info? The traceback info for your reference @noiji |
|
Users can click into the blossom-ci link below, and that page will show the failed test case. Hopefully, external developers can run the failed test case locally and see the error. |
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
d304509 to
517a457
Compare
📝 WalkthroughWalkthroughThe changes introduce support for a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server
participant Model
Client->>Server: POST /chat/completions (response_format: "json", schema)
Server->>Model: Generate with guided decoding (schema)
Model-->>Server: JSON output conforming to schema
Server-->>Client: JSON response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~15 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
tests/unittest/llmapi/apps/_test_openai_chat_json.py (1)
16-18: Fix the mismatch between fixture ID and actual model.The fixture ID indicates "TinyLlama-1.1B-Chat" but returns a Llama-3.1-8B model path. This inconsistency can be confusing and contradicts the past review feedback about preferring a tiny model for A10 memory constraints.
Consider using an actual tiny model for better resource utilization:
-@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) +@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) def model_name(): - return "llama-3.1-model/Llama-3.1-8B-Instruct" + return "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tensorrt_llm/serve/openai_protocol.py(2 hunks)tests/integration/defs/test_e2e.py(1 hunks)tests/integration/test_lists/test-db/l0_a10.yml(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat_json.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
tensorrt_llm/serve/openai_protocol.py (3)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
🧬 Code Graph Analysis (2)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (3)
llm_venv(707-723)test_root(2185-2186)unittest_path(90-91)
tensorrt_llm/serve/openai_protocol.py (1)
tensorrt_llm/sampling_params.py (1)
GuidedDecodingParams(14-36)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/_test_openai_chat_json.py
88-88: Undefined name Any
(F821)
103-103: Undefined name json
(F821)
104-104: Undefined name json
(F821)
106-106: Undefined name output_text
(F821)
109-109: Undefined name jsonschema
(F821)
131-131: Undefined name first_message
(F821)
🔇 Additional comments (6)
tensorrt_llm/serve/openai_protocol.py (2)
55-57: LGTM! Proper extension of ResponseFormat model.The changes correctly add support for the new
"json"response format type and the optionalschemafield. The type annotations are consistent with theGuidedDecodingParams.jsonparameter which acceptsUnion[str, BaseModel, dict].
146-151: LGTM! Correct implementation of JSON schema validation.The new
"json"response format handling properly validates that a schema is provided and correctly passes it toGuidedDecodingParams. The implementation follows the established pattern of other response format handlers.tests/integration/test_lists/test-db/l0_a10.yml (1)
25-25: LGTM! Proper addition of test coverage for JSON schema feature.The new test case
test_openai_chat_json_exampleis correctly added to the pre-merge test suite, ensuring the JSON schema functionality is validated before merging. The placement alongside other OpenAI test cases is appropriate.tests/integration/defs/test_e2e.py (1)
1446-1452: LGTM! Well-structured integration test function.The implementation correctly follows the established pattern in this file for OpenAI test integrations, properly setting up the test root and running pytest on the corresponding unit test file.
tests/unittest/llmapi/apps/_test_openai_chat_json.py (2)
22-79: Well-structured fixture definitions.The fixture setup is comprehensive and appropriate:
- Temporary YAML configuration correctly enables xgrammar guided decoding
- Server fixture properly configures the RemoteOpenAIServer with PyTorch backend
- Client fixtures provide both sync and async OpenAI clients
- JSON schema fixture defines a clear, testable structure
The guided decoding configuration with overlap scheduler disabled is necessary for JSON schema enforcement.
81-146: Well-designed test for JSON schema validation.The test effectively validates the new JSON response format feature:
- Tests multi-turn conversation with JSON schema enforcement
- Validates both JSON parsing and schema compliance
- Verifies that different responses generate varied content
- Includes proper error handling for invalid JSON
The overall test structure and validation approach is solid for ensuring the JSON schema feature works correctly.
| # Adapted from | ||
| # https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py | ||
| import os | ||
| import tempfile | ||
|
|
||
| import openai | ||
| import pytest | ||
| import yaml | ||
|
|
||
| from ..test_llm import get_model_path | ||
| from .openai_server import RemoteOpenAIServer | ||
|
|
||
| pytestmark = pytest.mark.threadleak(enabled=False) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing imports to fix runtime errors.
The code uses several modules and types that are not imported, which will cause runtime failures.
Add these missing imports:
import os
import tempfile
+import json
+from typing import Any
import openai
import pytest
import yaml
+import jsonschema
from ..test_llm import get_model_path
from .openai_server import RemoteOpenAIServer📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Adapted from | |
| # https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py | |
| import os | |
| import tempfile | |
| import openai | |
| import pytest | |
| import yaml | |
| from ..test_llm import get_model_path | |
| from .openai_server import RemoteOpenAIServer | |
| pytestmark = pytest.mark.threadleak(enabled=False) | |
| # Adapted from | |
| # https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py | |
| import os | |
| import tempfile | |
| import json | |
| from typing import Any | |
| import openai | |
| import pytest | |
| import yaml | |
| import jsonschema | |
| from ..test_llm import get_model_path | |
| from .openai_server import RemoteOpenAIServer | |
| pytestmark = pytest.mark.threadleak(enabled=False) |
🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py at the beginning (lines
1 to 14), the code uses modules and types that are not imported, causing runtime
errors. Review the code to identify all used but missing imports and add them
explicitly at the top of the file to ensure all dependencies are available
during execution.
| def _create_and_validate_response( | ||
| messages: list[dict[str, Any]]) -> dict[str, any]: | ||
| chat_completion = client.chat.completions.create( | ||
| model=model_name, | ||
| messages=messages, | ||
| max_tokens=1000, | ||
| temperature=0.0, | ||
| response_format={ | ||
| "type": "json", | ||
| "schema": user_profile_schema | ||
| }, | ||
| ) | ||
| message = chat_completion.choices[0].message | ||
| assert message.content is not None | ||
|
|
||
| try: | ||
| message_json = json.loads(message.content) | ||
| except json.JSONDecodeError: | ||
| pytest.fail( | ||
| f"The output was not a valid JSON string. Output: {output_text}" | ||
| ) | ||
|
|
||
| jsonschema.validate(instance=message_json, schema=user_profile_schema) | ||
| return message_json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variable reference.
The error message references an undefined variable output_text instead of the actual content.
try:
message_json = json.loads(message.content)
except json.JSONDecodeError:
pytest.fail(
- f"The output was not a valid JSON string. Output: {output_text}"
+ f"The output was not a valid JSON string. Output: {message.content}"
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _create_and_validate_response( | |
| messages: list[dict[str, Any]]) -> dict[str, any]: | |
| chat_completion = client.chat.completions.create( | |
| model=model_name, | |
| messages=messages, | |
| max_tokens=1000, | |
| temperature=0.0, | |
| response_format={ | |
| "type": "json", | |
| "schema": user_profile_schema | |
| }, | |
| ) | |
| message = chat_completion.choices[0].message | |
| assert message.content is not None | |
| try: | |
| message_json = json.loads(message.content) | |
| except json.JSONDecodeError: | |
| pytest.fail( | |
| f"The output was not a valid JSON string. Output: {output_text}" | |
| ) | |
| jsonschema.validate(instance=message_json, schema=user_profile_schema) | |
| return message_json | |
| try: | |
| message_json = json.loads(message.content) | |
| except json.JSONDecodeError: | |
| pytest.fail( | |
| f"The output was not a valid JSON string. Output: {message.content}" | |
| ) |
🧰 Tools
🪛 Ruff (0.12.2)
88-88: Undefined name Any
(F821)
103-103: Undefined name json
(F821)
104-104: Undefined name json
(F821)
106-106: Undefined name output_text
(F821)
109-109: Undefined name jsonschema
(F821)
🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py between lines 87 and
110, the exception handler references an undefined variable `output_text` in the
JSON decode error message. Replace `output_text` with the correct variable
`message.content` to accurately display the invalid JSON string content in the
error message.
| first_json = _create_and_validate_response(messages) | ||
|
|
||
| messages.extend([ | ||
| { | ||
| "role": "assistant", | ||
| "content": first_message.content, | ||
| }, | ||
| { | ||
| "role": "user", | ||
| "content": "Give me another one with a different name and age.", | ||
| }, | ||
| ]) | ||
| second_json = _create_and_validate_response(messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix undefined variable and improve test flow.
The code references first_message which is undefined. It should reference the first response properly.
first_json = _create_and_validate_response(messages)
messages.extend([
{
"role": "assistant",
- "content": first_message.content,
+ "content": json.dumps(first_json),
},
{
"role": "user",
"content": "Give me another one with a different name and age.",
},
])📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| first_json = _create_and_validate_response(messages) | |
| messages.extend([ | |
| { | |
| "role": "assistant", | |
| "content": first_message.content, | |
| }, | |
| { | |
| "role": "user", | |
| "content": "Give me another one with a different name and age.", | |
| }, | |
| ]) | |
| second_json = _create_and_validate_response(messages) | |
| first_json = _create_and_validate_response(messages) | |
| messages.extend([ | |
| { | |
| "role": "assistant", | |
| "content": json.dumps(first_json), | |
| }, | |
| { | |
| "role": "user", | |
| "content": "Give me another one with a different name and age.", | |
| }, | |
| ]) | |
| second_json = _create_and_validate_response(messages) |
🧰 Tools
🪛 Ruff (0.12.2)
131-131: Undefined name first_message
(F821)
🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py around lines 126 to 138,
the variable first_message is used but not defined, causing an error. Replace
first_message with the correct variable that holds the first response content,
likely obtained from first_json, to properly reference the first assistant
message. Adjust the code to extract the content from the first_json response and
use that in the messages.extend call to fix the undefined variable issue and
improve test flow.
Description
solves #5952
Test Coverage
request with the following curl command
Example Curl Request
Example Response
Summary by CodeRabbit
New Features
Tests