Skip to content

Conversation

@noiji
Copy link

@noiji noiji commented Jul 11, 2025

Description

solves #5952

Test Coverage

request with the following curl command

Example Curl Request

curl https://.../v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "",
    "messages": [
        {
            "role": "user",
            "content": "Please provide the profile of Joe Biden in JSON format."
        }
    ],
    "response_format": {
        "type": "json",
        "schema": {
            "type": "object",
            "properties": {
                "name": {
                    "type": "string",
                    "description": "The name of the president."
                },
                "party": {
                    "type": "string",
                    "description": "The political party of the president."
                }
            },
            "required": ["name", "party"]
        }
    },
    "chat_template_kwargs": {
        "enable_thinking": false
    }
}'

Example Response

{"id":"chatcmpl-ad6c82ecbc41445090d8fbf1b77fb4e4","object":"chat.completion","created":1752224984,"model":"","choices":[{"index":0,"message":{"role":"assistant","content":"{\n  \"name\": \"Joseph Robinette Biden Jr.\",\n "party\": \"Democratic Party\"\n}\n","reasoning_content":null,"tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null,"disaggregated_params":null}],"usage":{"prompt_tokens":23,"total_tokens":362,"completion_tokens":339},"prompt_token_ids":null}%

Summary by CodeRabbit

  • New Features

    • Added support for the "json" response format with schema enforcement in chat completions.
    • Introduced comprehensive tests to validate JSON response format and schema compliance in multi-turn chat scenarios.
  • Tests

    • Added new integration and unit tests to ensure correct handling of the "json" response format and schema validation.
    • Updated test configurations to include the new JSON response format test in relevant test suites.

@noiji noiji changed the title [https://github.com/NVIDIA/TensorRT-LLM/issues/5952][feat] Support JSON Schema in OpenAI-Compatible API [Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API Jul 11, 2025
@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Jul 11, 2025
@noiji noiji marked this pull request as ready for review July 11, 2025 09:13
@noiji noiji changed the title [Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API [Issue/#5952][feat] Support JSON Schema in OpenAI-Compatible API Jul 11, 2025
@noiji noiji changed the title [Issue/#5952][feat] Support JSON Schema in OpenAI-Compatible API [Issue/5952][feat] Support JSON Schema in OpenAI-Compatible API Jul 11, 2025
@nv-guomingz
Copy link
Collaborator

Hi @noiji ,thanks for your contribution to TensorRT-LLM.
Would you please add a test case for this feature? Please refer to this example.

Copy link
Collaborator

@syuoni syuoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally verified the feature works. Thanks for the contribution! @noiji

It would be better if you can help to add the test as suggested by @nv-guomingz

@noiji
Copy link
Author

noiji commented Jul 14, 2025

@nv-guomingz @syuoni Thanks for your comments! I've made the requested changes :)

Copy link
Collaborator

@nv-guomingz nv-guomingz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nv-guomingz nv-guomingz force-pushed the feat/openai_json_response branch from e6f4cdf to caa7f2a Compare July 15, 2025 02:57
@nv-guomingz
Copy link
Collaborator

/bot run

@nv-guomingz
Copy link
Collaborator

@nv-guomingz @syuoni Thanks for your comments! I've made the requested changes :)

Thanks,let's wait the CI pipeline results.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11872 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11872 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #8799 completed with status: 'FAILURE'

@nv-guomingz
Copy link
Collaborator

Hi, @noiji , the CI was failed due to format checking https://prod.blsm.nvidia.com/sw-tensorrt-top-1/blue/organizations/jenkins/LLM%2Fmain%2FL0_MergeRequest_PR/detail/L0_MergeRequest_PR/8799/pipeline/141.

Please run pre-commit run -a on your local side and then re-submit code change again.

@noiji noiji force-pushed the feat/openai_json_response branch from 0b7b143 to d304509 Compare July 15, 2025 09:57
@noiji
Copy link
Author

noiji commented Jul 15, 2025

/bot run

@LinPoly
Copy link
Collaborator

LinPoly commented Jul 15, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11945 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11945 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #8863 completed with status: 'FAILURE'

@noiji
Copy link
Author

noiji commented Jul 16, 2025

i am sorry but it seems like i don't have an access to ci/cd result (/LLM/main/L0_MergeRequest_PR pipeline #8863)

would there be any way i could check it out?

Copy link
Collaborator

@LinPoly LinPoly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI failed on this test and key error info was AssertionError: Please set max_seq_len to at least 8192 for kv cache manager, seems that we need to specify max_seq_len in the server args.

Comment on lines +16 to +18
@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"])
def model_name():
return "llama-3.1-model/Llama-3.1-8B-Instruct"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls change ids or model? They don't match. If tiny model is enough to demonstrate this feature, I would prefer the tiny model because the memory of A10 is kind of limited.

@LinPoly
Copy link
Collaborator

LinPoly commented Jul 16, 2025

would there be any way i could check it out

@nv-guomingz @syuoni Do you know how community contributors can access our CI info?

The traceback info for your reference @noiji

[2025-07-15T14:12:19.129Z] Traceback (most recent call last):
[2025-07-15T14:12:19.129Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1689, in _forward_step
[2025-07-15T14:12:19.129Z]     outputs = forward(scheduled_requests, self.resource_manager,
[2025-07-15T14:12:19.129Z]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.129Z]   File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
[2025-07-15T14:12:19.129Z]     result = func(*args, **kwargs)
[2025-07-15T14:12:19.130Z]              ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/py_executor.py", line 1677, in forward
[2025-07-15T14:12:19.130Z]     return self.model_engine.forward(
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[2025-07-15T14:12:19.130Z]     return func(*args, **kwargs)
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/utils.py", line 68, in wrapper
[2025-07-15T14:12:19.130Z]     return func(self, *args, **kwargs)
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 2130, in forward
[2025-07-15T14:12:19.130Z]     inputs, gather_ids = self._prepare_inputs(
[2025-07-15T14:12:19.130Z]                          ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/nvtx/nvtx.py", line 122, in inner
[2025-07-15T14:12:19.130Z]     result = func(*args, **kwargs)
[2025-07-15T14:12:19.130Z]              ^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 2065, in _prepare_inputs
[2025-07-15T14:12:19.130Z]     return self._prepare_tp_inputs(scheduled_requests, kv_cache_manager,
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/pyexecutor/model_engine.py", line 1513, in _prepare_tp_inputs
[2025-07-15T14:12:19.130Z]     attn_metadata.prepare()
[2025-07-15T14:12:19.130Z]   File "/usr/local/lib/python3.12/dist-packages/tensorrt_llm/_torch/attention_backend/trtllm.py", line 757, in prepare
[2025-07-15T14:12:19.130Z]     assert self.kv_lens[:self.num_seqs].max(
[2025-07-15T14:12:19.130Z]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2025-07-15T14:12:19.130Z] AssertionError: Please set max_seq_len to at least 8192 for kv cache manager.

@syuoni
Copy link
Collaborator

syuoni commented Jul 16, 2025

Users can click into the blossom-ci link below, and that page will show the failed test case. Hopefully, external developers can run the failed test case locally and see the error.

Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
noiji and others added 6 commits July 24, 2025 13:21
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
Signed-off-by: noiji <[email protected]>
@nv-guomingz nv-guomingz force-pushed the feat/openai_json_response branch from d304509 to 517a457 Compare July 24, 2025 05:21
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 24, 2025

📝 Walkthrough

Walkthrough

The changes introduce support for a new "json" response format in the OpenAI protocol implementation, requiring a schema for guided decoding. A new end-to-end integration test and a corresponding unit test validate this functionality, and the test suite configuration is updated to include the new test in the automated pipeline.

Changes

File(s) Change Summary
tensorrt_llm/serve/openai_protocol.py Added "json" as a valid ResponseFormat.type, introduced an optional schema field, and updated decoding logic to require and use the schema for "json" type.
tests/unittest/llmapi/apps/_test_openai_chat_json.py Added a new unit test module for OpenAI chat completion with the "json" response format and schema enforcement, including fixtures for server/client setup and schema definition.
tests/integration/defs/test_e2e.py Added an integration test function to run the new chat JSON example test using the provided virtual environment.
tests/integration/test_lists/test-db/l0_a10.yml Updated test suite configuration to include the new integration test in the pre-merge stage.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Model

    Client->>Server: POST /chat/completions (response_format: "json", schema)
    Server->>Model: Generate with guided decoding (schema)
    Model-->>Server: JSON output conforming to schema
    Server-->>Client: JSON response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/unittest/llmapi/apps/_test_openai_chat_json.py (1)

16-18: Fix the mismatch between fixture ID and actual model.

The fixture ID indicates "TinyLlama-1.1B-Chat" but returns a Llama-3.1-8B model path. This inconsistency can be confusing and contradicts the past review feedback about preferring a tiny model for A10 memory constraints.

Consider using an actual tiny model for better resource utilization:

-@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"])
+@pytest.fixture(scope="module", ids=["TinyLlama-1.1B-Chat"]) 
 def model_name():
-    return "llama-3.1-model/Llama-3.1-8B-Instruct"
+    return "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 428e340 and 517a457.

📒 Files selected for processing (4)
  • tensorrt_llm/serve/openai_protocol.py (2 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_a10.yml (1 hunks)
  • tests/unittest/llmapi/apps/_test_openai_chat_json.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
tensorrt_llm/serve/openai_protocol.py (3)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🧬 Code Graph Analysis (2)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (3)
  • llm_venv (707-723)
  • test_root (2185-2186)
  • unittest_path (90-91)
tensorrt_llm/serve/openai_protocol.py (1)
tensorrt_llm/sampling_params.py (1)
  • GuidedDecodingParams (14-36)
🪛 Ruff (0.12.2)
tests/unittest/llmapi/apps/_test_openai_chat_json.py

88-88: Undefined name Any

(F821)


103-103: Undefined name json

(F821)


104-104: Undefined name json

(F821)


106-106: Undefined name output_text

(F821)


109-109: Undefined name jsonschema

(F821)


131-131: Undefined name first_message

(F821)

🔇 Additional comments (6)
tensorrt_llm/serve/openai_protocol.py (2)

55-57: LGTM! Proper extension of ResponseFormat model.

The changes correctly add support for the new "json" response format type and the optional schema field. The type annotations are consistent with the GuidedDecodingParams.json parameter which accepts Union[str, BaseModel, dict].


146-151: LGTM! Correct implementation of JSON schema validation.

The new "json" response format handling properly validates that a schema is provided and correctly passes it to GuidedDecodingParams. The implementation follows the established pattern of other response format handlers.

tests/integration/test_lists/test-db/l0_a10.yml (1)

25-25: LGTM! Proper addition of test coverage for JSON schema feature.

The new test case test_openai_chat_json_example is correctly added to the pre-merge test suite, ensuring the JSON schema functionality is validated before merging. The placement alongside other OpenAI test cases is appropriate.

tests/integration/defs/test_e2e.py (1)

1446-1452: LGTM! Well-structured integration test function.

The implementation correctly follows the established pattern in this file for OpenAI test integrations, properly setting up the test root and running pytest on the corresponding unit test file.

tests/unittest/llmapi/apps/_test_openai_chat_json.py (2)

22-79: Well-structured fixture definitions.

The fixture setup is comprehensive and appropriate:

  • Temporary YAML configuration correctly enables xgrammar guided decoding
  • Server fixture properly configures the RemoteOpenAIServer with PyTorch backend
  • Client fixtures provide both sync and async OpenAI clients
  • JSON schema fixture defines a clear, testable structure

The guided decoding configuration with overlap scheduler disabled is necessary for JSON schema enforcement.


81-146: Well-designed test for JSON schema validation.

The test effectively validates the new JSON response format feature:

  • Tests multi-turn conversation with JSON schema enforcement
  • Validates both JSON parsing and schema compliance
  • Verifies that different responses generate varied content
  • Includes proper error handling for invalid JSON

The overall test structure and validation approach is solid for ensuring the JSON schema feature works correctly.

Comment on lines +1 to +14
# Adapted from
# https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py
import os
import tempfile

import openai
import pytest
import yaml

from ..test_llm import get_model_path
from .openai_server import RemoteOpenAIServer

pytestmark = pytest.mark.threadleak(enabled=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing imports to fix runtime errors.

The code uses several modules and types that are not imported, which will cause runtime failures.

Add these missing imports:

 import os
 import tempfile
+import json
+from typing import Any

 import openai
 import pytest
 import yaml
+import jsonschema

 from ..test_llm import get_model_path
 from .openai_server import RemoteOpenAIServer
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Adapted from
# https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py
import os
import tempfile
import openai
import pytest
import yaml
from ..test_llm import get_model_path
from .openai_server import RemoteOpenAIServer
pytestmark = pytest.mark.threadleak(enabled=False)
# Adapted from
# https://github.com/vllm-project/vllm/blob/aae6927be06dedbda39c6b0c30f6aa3242b84388/tests/entrypoints/openai/test_chat.py
import os
import tempfile
import json
from typing import Any
import openai
import pytest
import yaml
import jsonschema
from ..test_llm import get_model_path
from .openai_server import RemoteOpenAIServer
pytestmark = pytest.mark.threadleak(enabled=False)
🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py at the beginning (lines
1 to 14), the code uses modules and types that are not imported, causing runtime
errors. Review the code to identify all used but missing imports and add them
explicitly at the top of the file to ensure all dependencies are available
during execution.

Comment on lines +87 to +110
def _create_and_validate_response(
messages: list[dict[str, Any]]) -> dict[str, any]:
chat_completion = client.chat.completions.create(
model=model_name,
messages=messages,
max_tokens=1000,
temperature=0.0,
response_format={
"type": "json",
"schema": user_profile_schema
},
)
message = chat_completion.choices[0].message
assert message.content is not None

try:
message_json = json.loads(message.content)
except json.JSONDecodeError:
pytest.fail(
f"The output was not a valid JSON string. Output: {output_text}"
)

jsonschema.validate(instance=message_json, schema=user_profile_schema)
return message_json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined variable reference.

The error message references an undefined variable output_text instead of the actual content.

         try:
             message_json = json.loads(message.content)
         except json.JSONDecodeError:
             pytest.fail(
-                f"The output was not a valid JSON string. Output: {output_text}"
+                f"The output was not a valid JSON string. Output: {message.content}"
             )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _create_and_validate_response(
messages: list[dict[str, Any]]) -> dict[str, any]:
chat_completion = client.chat.completions.create(
model=model_name,
messages=messages,
max_tokens=1000,
temperature=0.0,
response_format={
"type": "json",
"schema": user_profile_schema
},
)
message = chat_completion.choices[0].message
assert message.content is not None
try:
message_json = json.loads(message.content)
except json.JSONDecodeError:
pytest.fail(
f"The output was not a valid JSON string. Output: {output_text}"
)
jsonschema.validate(instance=message_json, schema=user_profile_schema)
return message_json
try:
message_json = json.loads(message.content)
except json.JSONDecodeError:
pytest.fail(
f"The output was not a valid JSON string. Output: {message.content}"
)
🧰 Tools
🪛 Ruff (0.12.2)

88-88: Undefined name Any

(F821)


103-103: Undefined name json

(F821)


104-104: Undefined name json

(F821)


106-106: Undefined name output_text

(F821)


109-109: Undefined name jsonschema

(F821)

🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py between lines 87 and
110, the exception handler references an undefined variable `output_text` in the
JSON decode error message. Replace `output_text` with the correct variable
`message.content` to accurately display the invalid JSON string content in the
error message.

Comment on lines +126 to +138
first_json = _create_and_validate_response(messages)

messages.extend([
{
"role": "assistant",
"content": first_message.content,
},
{
"role": "user",
"content": "Give me another one with a different name and age.",
},
])
second_json = _create_and_validate_response(messages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix undefined variable and improve test flow.

The code references first_message which is undefined. It should reference the first response properly.

     first_json = _create_and_validate_response(messages)

     messages.extend([
         {
             "role": "assistant",
-            "content": first_message.content,
+            "content": json.dumps(first_json),
         },
         {
             "role": "user",
             "content": "Give me another one with a different name and age.",
         },
     ])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
first_json = _create_and_validate_response(messages)
messages.extend([
{
"role": "assistant",
"content": first_message.content,
},
{
"role": "user",
"content": "Give me another one with a different name and age.",
},
])
second_json = _create_and_validate_response(messages)
first_json = _create_and_validate_response(messages)
messages.extend([
{
"role": "assistant",
"content": json.dumps(first_json),
},
{
"role": "user",
"content": "Give me another one with a different name and age.",
},
])
second_json = _create_and_validate_response(messages)
🧰 Tools
🪛 Ruff (0.12.2)

131-131: Undefined name first_message

(F821)

🤖 Prompt for AI Agents
In tests/unittest/llmapi/apps/_test_openai_chat_json.py around lines 126 to 138,
the variable first_message is used but not defined, causing an error. Replace
first_message with the correct variable that holds the first response content,
likely obtained from first_json, to properly reference the first assistant
message. Adjust the code to extract the content from the first_json response and
use that in the messages.extend call to fix the undefined variable issue and
improve test flow.

@LinPoly
Copy link
Collaborator

LinPoly commented Jul 24, 2025

@noiji We cherry-picked your PR to accelerate merging, you can comment on this PR if you have any questions/concerns

@nv-guomingz
Copy link
Collaborator

nv-guomingz commented Jul 26, 2025

Close it since #6321 merged and we've cite your contribution on that PR.
Thanks for your contribution to TRT-LLM. @noiji

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants