[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API #5957

noiji · 2025-07-11T09:03:20Z

Description

Test Coverage

request with the following curl command

Example Curl Request

curl https://.../v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "",
    "messages": [
        {
            "role": "user",
            "content": "Please provide the profile of Joe Biden in JSON format."
        }
    ],
    "response_format": {
        "type": "json",
        "schema": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "The name of the president."
                },
                "party": {
                    "type": "string",
                    "description": "The political party of the president."
                }
            },
            "required": ["name", "party"]
        }
    },
    "chat_template_kwargs": {
        "enable_thinking": false
    }
}'

Example Response

{"id":"chatcmpl-ad6c82ecbc41445090d8fbf1b77fb4e4","object":"chat.completion","created":1752224984,"model":"","choices":[{"index":0,"message":{"role":"assistant","content":"{\n  \"name\": \"Joseph Robinette Biden Jr.\",\n "party\": \"Democratic Party\"\n}\n","reasoning_content":null,"tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":23,"total_tokens":362,"completion_tokens":339},"prompt_token_ids":null}%

Summary by CodeRabbit

New Features
- Added support for the "json" response format with schema enforcement in chat completions.
- Introduced comprehensive tests to validate JSON response format and schema compliance in multi-turn chat scenarios.
Tests
- Added new integration and unit tests to ensure correct handling of the "json" response format and schema validation.
- Updated test configurations to include the new JSON response format test in relevant test suites.

nv-guomingz · 2025-07-11T09:58:56Z

Hi @noiji ,thanks for your contribution to TensorRT-LLM.
Would you please add a test case for this feature? Please refer to this example.

syuoni

Locally verified the feature works. Thanks for the contribution! @noiji

It would be better if you can help to add the test as suggested by @nv-guomingz

tensorrt_llm/serve/openai_protocol.py

noiji · 2025-07-14T14:49:10Z

@nv-guomingz @syuoni Thanks for your comments! I've made the requested changes :)

nv-guomingz

LGTM

nv-guomingz · 2025-07-15T02:57:17Z

/bot run

nv-guomingz · 2025-07-15T02:57:37Z

@nv-guomingz @syuoni Thanks for your comments! I've made the requested changes :)

Thanks，let's wait the CI pipeline results.

tensorrt-cicd · 2025-07-15T03:02:48Z

PR_Github #11872 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-15T03:18:50Z

PR_Github #11872 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8799 completed with status: 'FAILURE'

nv-guomingz · 2025-07-15T09:19:04Z

Hi, @noiji , the CI was failed due to format checking https://prod.blsm.nvidia.com/sw-tensorrt-top-1/blue/organizations/jenkins/LLM%2Fmain%2FL0_MergeRequest_PR/detail/L0_MergeRequest_PR/8799/pipeline/141.

Please run pre-commit run -a on your local side and then re-submit code change again.

noiji · 2025-07-15T09:58:02Z

/bot run

LinPoly · 2025-07-15T12:40:11Z

/bot run

tensorrt-cicd · 2025-07-15T12:45:20Z

PR_Github #11945 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-15T14:49:40Z

PR_Github #11945 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8863 completed with status: 'FAILURE'

noiji · 2025-07-16T02:28:06Z

i am sorry but it seems like i don't have an access to ci/cd result (/LLM/main/L0_MergeRequest_PR pipeline #8863)

would there be any way i could check it out?

LinPoly

The CI failed on this test and key error info was AssertionError: Please set max_seq_len to at least 8192 for kv cache manager, seems that we need to specify max_seq_len in the server args.

LinPoly · 2025-07-16T06:14:48Z

tests/unittest/llmapi/apps/_test_openai_chat_json.py

+@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"])
+def model_name():
+    return "llama-3.1-model/Llama-3.1-8B-Instruct"


Can you pls change ids or model? They don't match. If tiny model is enough to demonstrate this feature, I would prefer the tiny model because the memory of A10 is kind of limited.

LinPoly · 2025-07-16T06:48:52Z

would there be any way i could check it out

@nv-guomingz @syuoni Do you know how community contributors can access our CI info?

The traceback info for your reference @noiji

[2025-07-15T14:12:19.129Z] Traceback (most recent call last):
[2025-07-15T14:12:19.129Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1689, in _forward_step
[2025-07-15T14:12:19.129Z]     outputs = forward(scheduled_requests, self.resource_manager,
[2025-07-15T14:12:19.129Z]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.129Z]   File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
[2025-07-15T14:12:19.129Z]     result = func(*args, **kwargs)
[2025-07-15T14:12:19.130Z]              ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1677, in forward
[2025-07-15T14:12:19.130Z]     return self.model_engine.forward(
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-07-15T14:12:19.130Z]     return func(*args, **kwargs)
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/utils.py", line 68, in wrapper
[2025-07-15T14:12:19.130Z]     return func(self, *args, **kwargs)
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 2130, in forward
[2025-07-15T14:12:19.130Z]     inputs, gather_ids = self._prepare_inputs(
[2025-07-15T14:12:19.130Z]                          ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
[2025-07-15T14:12:19.130Z]     result = func(*args, **kwargs)
[2025-07-15T14:12:19.130Z]              ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 2065, in _prepare_inputs
[2025-07-15T14:12:19.130Z]     return self._prepare_tp_inputs(scheduled_requests, kv_cache_manager,
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1513, in _prepare_tp_inputs
[2025-07-15T14:12:19.130Z]     attn_metadata.prepare()
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/attention_backend/trtllm.py", line 757, in prepare
[2025-07-15T14:12:19.130Z]     assert self.kv_lens[:self.num_seqs].max(
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z] AssertionError: Please set max_seq_len to at least 8192 for kv cache manager.

syuoni · 2025-07-16T08:32:27Z

Users can click into the blossom-ci link below, and that page will show the failed test case. Hopefully, external developers can run the failed test case locally and see the error.

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

Signed-off-by: noiji <[email protected]>

coderabbitai · 2025-07-24T05:21:51Z

📝 Walkthrough

Walkthrough

The changes introduce support for a new "json" response format in the OpenAI protocol implementation, requiring a schema for guided decoding. A new end-to-end integration test and a corresponding unit test validate this functionality, and the test suite configuration is updated to include the new test in the automated pipeline.

Changes

File(s)	Change Summary
tensorrt_llm/serve/openai_protocol.py	Added `"json"` as a valid `ResponseFormat.type`, introduced an optional `schema` field, and updated decoding logic to require and use the schema for `"json"` type.
tests/unittest/llmapi/apps/_test_openai_chat_json.py	Added a new unit test module for OpenAI chat completion with the `"json"` response format and schema enforcement, including fixtures for server/client setup and schema definition.
tests/integration/defs/test_e2e.py	Added an integration test function to run the new chat JSON example test using the provided virtual environment.
tests/integration/test_lists/test-db/l0_a10.yml	Updated test suite configuration to include the new integration test in the pre-merge stage.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Model

    Client->>Server: POST /chat/completions (response_format: "json", schema)
    Server->>Model: Generate with guided decoding (schema)
    Model-->>Server: JSON output conforming to schema
    Server-->>Client: JSON response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

tests/unittest/llmapi/apps/_test_openai_chat_json.py (1)
16-18: Fix the mismatch between fixture ID and actual model.

The fixture ID indicates "TinyLlama-1.1B-Chat" but returns a Llama-3.1-8B model path. This inconsistency can be confusing and contradicts the past review feedback about preferring a tiny model for A10 memory constraints.

Consider using an actual tiny model for better resource utilization:
-@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"])
+@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) 
 def model_name():
-    return "llama-3.1-model/Llama-3.1-8B-Instruct"
+    return "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428e340 and 517a457.

📒 Files selected for processing (4)

tensorrt_llm/serve/openai_protocol.py (2 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (1 hunks)
tests/unittest/llmapi/apps/_test_openai_chat_json.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

tensorrt_llm/serve/openai_protocol.py (3)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🧬 Code Graph Analysis (2)

tests/integration/defs/test_e2e.py (1)

tests/integration/defs/conftest.py (3)

llm_venv (707-723)

test_root (2185-2186)

unittest_path (90-91)

tensorrt_llm/serve/openai_protocol.py (1)

tensorrt_llm/sampling_params.py (1)

GuidedDecodingParams (14-36)

🪛 Ruff (0.12.2)

tests/unittest/llmapi/apps/_test_openai_chat_json.py

88-88: Undefined name Any

(F821)

103-103: Undefined name json

(F821)

104-104: Undefined name json

(F821)

106-106: Undefined name output_text

(F821)

109-109: Undefined name jsonschema

(F821)

131-131: Undefined name first_message

(F821)

🔇 Additional comments (6)

tensorrt_llm/serve/openai_protocol.py (2)

55-57: LGTM! Proper extension of ResponseFormat model.

The changes correctly add support for the new "json" response format type and the optional schema field. The type annotations are consistent with the GuidedDecodingParams.json parameter which accepts Union[str, BaseModel, dict].

146-151: LGTM! Correct implementation of JSON schema validation.

The new "json" response format handling properly validates that a schema is provided and correctly passes it to GuidedDecodingParams. The implementation follows the established pattern of other response format handlers.

tests/integration/test_lists/test-db/l0_a10.yml (1)

25-25: LGTM! Proper addition of test coverage for JSON schema feature.

The new test case test_openai_chat_json_example is correctly added to the pre-merge test suite, ensuring the JSON schema functionality is validated before merging. The placement alongside other OpenAI test cases is appropriate.

tests/integration/defs/test_e2e.py (1)

1446-1452: LGTM! Well-structured integration test function.

The implementation correctly follows the established pattern in this file for OpenAI test integrations, properly setting up the test root and running pytest on the corresponding unit test file.

tests/unittest/llmapi/apps/_test_openai_chat_json.py (2)

22-79: Well-structured fixture definitions.

The fixture setup is comprehensive and appropriate:

Temporary YAML configuration correctly enables xgrammar guided decoding

Server fixture properly configures the RemoteOpenAIServer with PyTorch backend

Client fixtures provide both sync and async OpenAI clients

JSON schema fixture defines a clear, testable structure

The guided decoding configuration with overlap scheduler disabled is necessary for JSON schema enforcement.

81-146: Well-designed test for JSON schema validation.

The test effectively validates the new JSON response format feature:

Tests multi-turn conversation with JSON schema enforcement

Validates both JSON parsing and schema compliance

Verifies that different responses generate varied content

Includes proper error handling for invalid JSON

The overall test structure and validation approach is solid for ensuring the JSON schema feature works correctly.

coderabbitai · 2025-07-24T05:27:21Z

tests/unittest/llmapi/apps/_test_openai_chat_json.py

+# Adapted from
+# https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py
+import os
+import tempfile
+
+import openai
+import pytest
+import yaml
+
+from ..test_llm import get_model_path
+from .openai_server import RemoteOpenAIServer
+
+pytestmark = pytest.mark.threadleak(enabled=False)
+


⚠️ Potential issue

Add missing imports to fix runtime errors.

The code uses several modules and types that are not imported, which will cause runtime failures.

Add these missing imports:

import os import tempfile +import json +from typing import Any import openai import pytest import yaml +import jsonschema from ..test_llm import get_model_path from .openai_server import RemoteOpenAIServer

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Adapted from

# https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py

import os

import tempfile

import openai

import pytest

import yaml

from ..test_llm import get_model_path

from .openai_server import RemoteOpenAIServer

pytestmark = pytest.mark.threadleak(enabled=False)

# Adapted from

# https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py

import os

import tempfile

import json

from typing import Any

import openai

import pytest

import yaml

import jsonschema

from ..test_llm import get_model_path

from .openai_server import RemoteOpenAIServer

pytestmark = pytest.mark.threadleak(enabled=False)

🤖 Prompt for AI Agents

In tests/unittest/llmapi/apps/_test_openai_chat_json.py at the beginning (lines 1 to 14), the code uses modules and types that are not imported, causing runtime errors. Review the code to identify all used but missing imports and add them explicitly at the top of the file to ensure all dependencies are available during execution.

coderabbitai · 2025-07-24T05:27:22Z

tests/unittest/llmapi/apps/_test_openai_chat_json.py

+    def _create_and_validate_response(
+            messages: list[dict[str, Any]]) -> dict[str, any]:
+        chat_completion = client.chat.completions.create(
+            model=model_name,
+            messages=messages,
+            max_tokens=1000,
+            temperature=0.0,
+            response_format={
+                "type": "json",
+                "schema": user_profile_schema
+            },
+        )
+        message = chat_completion.choices[0].message
+        assert message.content is not None
+
+        try:
+            message_json = json.loads(message.content)
+        except json.JSONDecodeError:
+            pytest.fail(
+                f"The output was not a valid JSON string. Output: {output_text}"
+            )
+
+        jsonschema.validate(instance=message_json, schema=user_profile_schema)
+        return message_json


⚠️ Potential issue

Fix undefined variable reference.

The error message references an undefined variable output_text instead of the actual content.

try: message_json = json.loads(message.content) except json.JSONDecodeError: pytest.fail( - f"The output was not a valid JSON string. Output: {output_text}" + f"The output was not a valid JSON string. Output: {message.content}" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def _create_and_validate_response(

messages: list[dict[str, Any]]) -> dict[str, any]:

chat_completion = client.chat.completions.create(

model=model_name,

messages=messages,

max_tokens=1000,

temperature=0.0,

response_format={

"type": "json",

"schema": user_profile_schema

},

)

message = chat_completion.choices[0].message

assert message.content is not None

try:

message_json = json.loads(message.content)

except json.JSONDecodeError:

pytest.fail(

f"The output was not a valid JSON string. Output: {output_text}"

)

jsonschema.validate(instance=message_json, schema=user_profile_schema)

return message_json

try:

message_json = json.loads(message.content)

except json.JSONDecodeError:

pytest.fail(

f"The output was not a valid JSON string. Output: {message.content}"

)

🧰 Tools

🪛 Ruff (0.12.2)

88-88: Undefined name Any

(F821)

103-103: Undefined name json

(F821)

104-104: Undefined name json

(F821)

106-106: Undefined name output_text

(F821)

109-109: Undefined name jsonschema

(F821)

🤖 Prompt for AI Agents

In tests/unittest/llmapi/apps/_test_openai_chat_json.py between lines 87 and 110, the exception handler references an undefined variable `output_text` in the JSON decode error message. Replace `output_text` with the correct variable `message.content` to accurately display the invalid JSON string content in the error message.

coderabbitai · 2025-07-24T05:27:22Z

tests/unittest/llmapi/apps/_test_openai_chat_json.py

+    first_json = _create_and_validate_response(messages)
+
+    messages.extend([
+        {
+            "role": "assistant",
+            "content": first_message.content,
+        },
+        {
+            "role": "user",
+            "content": "Give me another one with a different name and age.",
+        },
+    ])
+    second_json = _create_and_validate_response(messages)


⚠️ Potential issue

Fix undefined variable and improve test flow.

The code references first_message which is undefined. It should reference the first response properly.

first_json = _create_and_validate_response(messages) messages.extend([ { "role": "assistant", - "content": first_message.content, + "content": json.dumps(first_json), }, { "role": "user", "content": "Give me another one with a different name and age.", }, ])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

first_json = _create_and_validate_response(messages)

messages.extend([

{

"role": "assistant",

"content": first_message.content,

},

{

"role": "user",

"content": "Give me another one with a different name and age.",

},

])

second_json = _create_and_validate_response(messages)

first_json = _create_and_validate_response(messages)

messages.extend([

{

"role": "assistant",

"content": json.dumps(first_json),

},

{

"role": "user",

"content": "Give me another one with a different name and age.",

},

])

second_json = _create_and_validate_response(messages)

🧰 Tools

🪛 Ruff (0.12.2)

131-131: Undefined name first_message

(F821)

🤖 Prompt for AI Agents

In tests/unittest/llmapi/apps/_test_openai_chat_json.py around lines 126 to 138, the variable first_message is used but not defined, causing an error. Replace first_message with the correct variable that holds the first response content, likely obtained from first_json, to properly reference the first assistant message. Adjust the code to extract the content from the first_json response and use that in the messages.extend call to fix the undefined variable issue and improve test flow.

LinPoly · 2025-07-24T06:47:25Z

@noiji We cherry-picked your PR to accelerate merging, you can comment on this PR if you have any questions/concerns

nv-guomingz · 2025-07-26T02:11:15Z

Close it since #6321 merged and we've cite your contribution on that PR.
Thanks for your contribution to TRT-LLM. @noiji

noiji changed the title ~~[https://github.com/NVIDIA/TensorRT-LLM/issues/5952][feat] Support JSON Schema in OpenAI-Compatible API~~ [Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API Jul 11, 2025

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Jul 11, 2025

noiji marked this pull request as ready for review July 11, 2025 09:13

noiji changed the title ~~[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API~~ [Issue/#5952][feat] Support JSON Schema in OpenAI-Compatible API Jul 11, 2025

noiji changed the title ~~[Issue/#5952][feat] Support JSON Schema in OpenAI-Compatible API~~ [Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API Jul 11, 2025

juney-nvidia requested review from LinPoly, nv-guomingz and syuoni July 11, 2025 09:33

syuoni approved these changes Jul 11, 2025

View reviewed changes

tensorrt_llm/serve/openai_protocol.py Outdated Show resolved Hide resolved

syuoni reviewed Jul 11, 2025

View reviewed changes

tensorrt_llm/serve/openai_protocol.py Show resolved Hide resolved

nv-guomingz approved these changes Jul 15, 2025

View reviewed changes

nv-guomingz force-pushed the feat/openai_json_response branch from e6f4cdf to caa7f2a Compare July 15, 2025 02:57

noiji force-pushed the feat/openai_json_response branch from 0b7b143 to d304509 Compare July 15, 2025 09:57

LinPoly approved these changes Jul 15, 2025

View reviewed changes

LinPoly reviewed Jul 16, 2025

View reviewed changes

Support json response format

23af39f

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

noiji and others added 6 commits July 24, 2025 13:21

support dict only

6560280

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

refactor: reorder decoding types

5de9195

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

Add _test_openai_chat_json.py

1824412

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

Update test_e2e.py

6af16c6

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

Update l0_a10.yml

8486c10

Signed-off-by: noiji <[email protected]> Signed-off-by: noiji <[email protected]>

apply formatter

517a457

Signed-off-by: noiji <[email protected]>

nv-guomingz force-pushed the feat/openai_json_response branch from d304509 to 517a457 Compare July 24, 2025 05:21

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

nv-guomingz mentioned this pull request Jul 24, 2025

feat: Support JSON Schema in OpenAI-Compatible API #6321

Merged

nv-guomingz closed this Jul 26, 2025

Uh oh!

[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API #5957

[Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API #5957

Conversation

noiji commented Jul 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Example Curl Request

Example Response

Summary by CodeRabbit

Uh oh!

nv-guomingz commented Jul 11, 2025

Uh oh!

syuoni left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

noiji commented Jul 14, 2025

Uh oh!

nv-guomingz left a comment

Choose a reason for hiding this comment

Uh oh!

nv-guomingz commented Jul 15, 2025

Uh oh!

nv-guomingz commented Jul 15, 2025

Uh oh!

tensorrt-cicd commented Jul 15, 2025

Uh oh!

tensorrt-cicd commented Jul 15, 2025

Uh oh!

nv-guomingz commented Jul 15, 2025

Uh oh!

noiji commented Jul 15, 2025

Uh oh!

LinPoly commented Jul 15, 2025

Uh oh!

tensorrt-cicd commented Jul 15, 2025

Uh oh!

tensorrt-cicd commented Jul 15, 2025

Uh oh!

noiji commented Jul 16, 2025

Uh oh!

LinPoly left a comment

Choose a reason for hiding this comment

Uh oh!

LinPoly Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

LinPoly commented Jul 16, 2025

Uh oh!

syuoni commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

LinPoly commented Jul 24, 2025

noiji commented Jul 11, 2025 •

edited by coderabbitai bot

Loading

syuoni commented Jul 16, 2025 •

edited

Loading

coderabbitai bot commented Jul 24, 2025 •

edited

Loading

nv-guomingz commented Jul 26, 2025 •

edited

Loading