Skip to content

Conversation

@Superjomn
Copy link
Collaborator

@Superjomn Superjomn commented Aug 8, 2025

Summary by CodeRabbit

  • New Features

    • Added integration tests to verify default backend selection and logging for the LLM API, including checks for both PyTorch and TensorRT backends.
  • Bug Fixes

    • Improved logging during backend detection to provide clearer information about which backend is being used.
  • Tests

    • Introduced new test cases and a test class to ensure correct backend behavior and logging output.
    • Expanded test coverage for backend selection scenarios in the LLM API.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@Superjomn Superjomn requested a review from a team as a code owner August 8, 2025 03:16
@Superjomn Superjomn requested a review from syuoni August 8, 2025 03:16
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 8, 2025

📝 Walkthrough

Walkthrough

The changes implement explicit backend selection and logging for the LLM API, defaulting to PyTorch. The logger level is temporarily set to "info" during backend detection. Integration tests and a test list are added to verify backend selection, argument types, and log outputs. Command-line backend selection is introduced for integration scripts.

Changes

Cohort / File(s) Change Summary
Backend Detection & Logging in BaseLLM
tensorrt_llm/llmapi/llm.py
Modified BaseLLM.__init__ to temporarily save and restore the logger level, setting it to "info" during backend detection and argument parsing. Added info-level log messages indicating the backend in use ("PyTorch", "AutoDeploy", or "TensorRT"). No changes to public API signatures.
Integration Test Script Backend Option
tests/integration/defs/llmapi/_run_llmapi_llm.py
Added an optional --backend argument (default None, internally defaulted to "tensorrt") to the integration test script. Validates backend choice ("pytorch" or "tensorrt"), dynamically selects the LLM implementation class accordingly, and conditionally passes build_config for the TensorRT backend. Updated function signature to include the backend parameter.
New Integration Tests for Backend Selection and Logging
tests/integration/defs/llmapi/test_llm_api_qa.py
Added a new test suite TestLlmDefaultBackend with tests verifying default backend is PyTorch, correct argument types for PyTorch and TensorRT backends, successful generation calls, and subprocess-based log output verification for backend-specific info messages.
Test List Update for New Backend Tests
tests/integration/test_lists/qa/llm_function_full.txt
Appended four new test entries referencing backend selection and logging tests in the test list file under a comment about PyTorch being the default backend. No existing entries were modified.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant TestScript (_run_llmapi_llm.py)
    participant LLM API (llmapi/llm.py)
    participant Logger

    User->>TestScript: Run with --backend (default: tensorrt)
    TestScript->>LLM API: Instantiate LLM (backend param)
    LLM API->>Logger: Save current log level
    LLM API->>Logger: Set log level to info
    alt Detect backend
        LLM API->>Logger: Log backend info message
    end
    LLM API->>Logger: Restore original log level
    LLM API-->>TestScript: LLM instance
    TestScript->>LLM API: Generate output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Suggested reviewers

  • niukuo
  • chzblych
  • litaotju

Note

🔌 MCP (Model Context Protocol) integration is now available in Early Access!

Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Superjomn Superjomn requested review from LarryXFly and removed request for LarryXFly August 8, 2025 03:17
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tensorrt_llm/llmapi/llm.py (1)

130-141: Backend selection logic is correct with clear logging.

The explicit backend selection and logging implementation looks good. Each backend case is properly handled with appropriate argument class selection and informative log messages.

For future maintainability, consider using a dictionary-based approach:

+            backend_config = {
+                "pytorch": (TorchLlmArgs, "Using LLM with PyTorch backend"),
+                "_autodeploy": (lambda: (AutoDeployLlmArgs if 'AutoDeployLlmArgs' in locals() else TorchLlmArgs), "Using LLM with AutoDeploy backend"),
+            }
+            
+            if backend in backend_config:
+                llm_args_cls, message = backend_config[backend]
+                logger.info(message)
+                if callable(llm_args_cls):
+                    llm_args_cls = llm_args_cls()
+            else:
+                logger.info("Using LLM with TensorRT backend")
+                llm_args_cls = TrtLlmArgs
tests/integration/defs/llmapi/test_llm_qa.py (2)

11-15: Fix docstring formatting.

The docstring should either be a single line or properly formatted as a multi-line docstring.

Apply this diff to fix the formatting:

-    """
-    Check that the default backend is PyTorch for v1.0 breaking change
-    """
+    """Check that the default backend is PyTorch for v1.0 breaking change."""

46-70: Excellent logging verification test with minor formatting issue.

The test effectively validates backend logging by:

  • Running the external script with different backend options
  • Capturing and verifying specific log messages
  • Testing both PyTorch and TensorRT backend logging

Fix the line length issue on line 69:

-        assert "Using LLM with TensorRT backend" in tensorrt_output, f"Expected 'tensorrt' in logs, got: {tensorrt_output}"
+        assert "Using LLM with TensorRT backend" in tensorrt_output, \
+            f"Expected 'tensorrt' in logs, got: {tensorrt_output}"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d289d85 and 6848853.

📒 Files selected for processing (4)
  • tensorrt_llm/llmapi/llm.py (2 hunks)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py (1 hunks)
  • tests/integration/defs/llmapi/test_llm_qa.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/test_llm_qa.py
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/test_llm_qa.py
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/test_llm_qa.py
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/test_llm_qa.py
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_qa.py
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
📚 Learning: 2025-07-22T08:33:49.109Z
Learnt from: yiqingy0
PR: NVIDIA/TensorRT-LLM#5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.109Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

Applied to files:

  • tests/integration/test_lists/qa/llm_function_full.txt
🪛 Ruff (0.12.2)
tests/integration/defs/llmapi/test_llm_qa.py

12-13: One-line docstring should fit on one line

Reformat to one line

(D200)


12-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


69-69: Line too long (123 > 120)

(E501)

🔇 Additional comments (8)
tensorrt_llm/llmapi/llm.py (1)

170-172: Proper cleanup in finally block.

The logger level restoration in the finally block is implemented correctly, ensuring the original level is restored even if exceptions occur during backend detection.

Note: This cleanup is still subject to the thread safety concerns mentioned earlier regarding global logger state modification.

tests/integration/test_lists/qa/llm_function_full.txt (1)

674-679: Potential duplication in test list.

The test test_llm_args_logging appears twice (lines 676 and 679). Please verify if this duplication is intentional or if one of these entries should reference a different test method.

tests/integration/defs/llmapi/_run_llmapi_llm.py (4)

3-3: LGTM!

Proper import of Optional for type annotation.


7-8: LGTM!

Clean import of LLM classes from both backend modules to enable dynamic backend selection.


15-16: LGTM!

Proper addition of backend selection CLI option and function signature update with correct typing.


22-29: Excellent backend selection implementation.

The logic cleanly handles:

  • Default fallback to "pytorch" for backward compatibility
  • Proper validation of supported backends
  • Dynamic class and argument selection based on backend type
  • Correct handling of backend-specific requirements (BuildConfig for TensorRT)
tests/integration/defs/llmapi/test_llm_qa.py (2)

1-9: LGTM!

Clean test file setup with appropriate imports and model path configuration.


16-44: Excellent test coverage for backend selection.

Both test methods provide comprehensive validation:

  • Verify correct default backend selection (PyTorch)
  • Validate backend-specific argument types (TorchLlmArgs vs TrtLlmArgs)
  • Include functional testing with generation calls
  • Use appropriately scoped imports within test methods

Copy link
Collaborator

@syuoni syuoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit concerned with the logging level intervention in llm.py, please take a look, thanks!

@Superjomn
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/integration/defs/llmapi/test_llm_api_qa.py (2)

12-14: Fix docstring formatting.

The docstring has formatting issues that should be addressed.

-    """
-    Check that the default backend is PyTorch for v1.0 breaking change
-    """
+    """Check that the default backend is PyTorch for v1.0 breaking change."""

46-70: Fix line length violation and improve formatting.

The test logic is sound, but there's a line length violation that needs to be addressed.

-        # Check that tensorrt backend keyword appears in logs
-        assert "Using LLM with TensorRT backend" in tensorrt_output, f"Expected 'tensorrt' in logs, got: {tensorrt_output}"
+        # Check that tensorrt backend keyword appears in logs
+        assert "Using LLM with TensorRT backend" in tensorrt_output, (
+            f"Expected 'tensorrt' in logs, got: {tensorrt_output}")
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6848853 and 3430e6e.

📒 Files selected for processing (4)
  • tensorrt_llm/llmapi/llm.py (2 hunks)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py (1 hunks)
  • tests/integration/defs/llmapi/test_llm_api_qa.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tensorrt_llm/llmapi/llm.py
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🪛 Ruff (0.12.2)
tests/integration/defs/llmapi/test_llm_api_qa.py

12-13: One-line docstring should fit on one line

Reformat to one line

(D200)


12-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


69-69: Line too long (123 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
tests/integration/defs/llmapi/test_llm_api_qa.py (3)

1-9: LGTM on imports and setup!

The imports are appropriate for an integration test, and the model path construction follows the expected pattern for accessing test models.


16-28: LGTM on default backend test!

The test correctly verifies that the default LLM instantiation uses PyTorch backend with appropriate argument types. The generation test ensures functional verification beyond just type checking.


30-44: LGTM on TensorRT backend test!

The test appropriately verifies TensorRT backend behavior with correct imports and type assertions. The flexible backend assertion (allowing both "tensorrt" and None) suggests proper handling of different TensorRT engine states.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14564 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14564 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #23 completed with status: 'FAILURE'

@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14594 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
tests/integration/defs/llmapi/test_llm_api_qa.py (5)

11-14: Docstring style nits (D200 / D415)

First line should be a single-line sentence ending with a period:

-"""
-Check that the default backend is PyTorch for v1.0 breaking change
-"""
+"""Verify that the default LLM backend is PyTorch (v1.0 breaking change)."""

24-26: Missing assertion on .generate() output

The loop only prints tokens; if generation silently fails the test still passes.
Capture a token or the full string and assert it is non-empty.

outputs = list(llm.generate(["Hello, world!"]))
assert outputs and outputs[0], "Generation returned empty output"

27-29: Remove print statements in tests

Printing tokens clutters CI logs; rely on assertions instead.

Also applies to: 43-44


48-49: Prefer pathlib.Path for script path construction

Readability & platform safety.

script_path = Path(__file__).with_name("_run_llmapi_llm.py")

60-60: Line exceeds 120 chars

Break the long assertion line for compliance with style guides.

-assert "Using LLM with PyTorch backend" in pytorch_output, f"Expected 'pytorch' in logs, got: {pytorch_output}"
+msg = f"Expected 'pytorch' backend log, got: {pytorch_output}"
+assert "Using LLM with PyTorch backend" in pytorch_output, msg
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3430e6e and 333b4a2.

📒 Files selected for processing (4)
  • tensorrt_llm/llmapi/llm.py (2 hunks)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py (1 hunks)
  • tests/integration/defs/llmapi/test_llm_api_qa.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
  • tensorrt_llm/llmapi/llm.py
  • tests/integration/test_lists/qa/llm_function_full.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🪛 Ruff (0.12.2)
tests/integration/defs/llmapi/test_llm_api_qa.py

12-13: One-line docstring should fit on one line

Reformat to one line

(D200)


12-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


69-69: Line too long (123 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/integration/defs/llmapi/test_llm_api_qa.py (1)

40-41: Avoid accepting None for backend

Allowing None masks mis-configuration; the API should always expose the resolved backend string.

-assert llm.args.backend in ("tensorrt", None)
+assert llm.args.backend == "tensorrt"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14594 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #29 completed with status: 'FAILURE'

@Superjomn
Copy link
Collaborator Author

/bot run

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/integration/defs/llmapi/test_llm_api_qa.py (1)

6-9: Fix: don’t import conftest; derive model_path from a fixture (likely cause of CI failures).

Importing from conftest and calling a fixture-like helper at module import time breaks pytest collection and is fragile. Build the model_path inside tests via a real fixture and pathlib for cross‑platform safety.

Apply this removal here (see method-level diffs below for additions):

-from ..conftest import llm_models_root
-
-model_path = llm_models_root() + "/llama-models-v3/llama-v3-8b-instruct-hf"
🧹 Nitpick comments (5)
tests/integration/defs/llmapi/test_llm_api_qa.py (5)

12-14: Docstring formatting: one line with punctuation.

Comply with D200/D415: keep it on one line and end with a period.

-class TestLlmDefaultBackend:
-    """
-    Check that the default backend is PyTorch for v1.0 breaking change
-    """
+class TestLlmDefaultBackend:
+    """Check that the default backend is PyTorch for the v1.0 breaking change."""

27-29: Avoid noisy test logs.

Consuming the generator is enough; printing in tests makes CI logs noisy.

-    for output in llm.generate(["Hello, world!"]):
-        print(output)
+    for _ in llm.generate(["Hello, world!"]):
+        pass  # consume generator; avoid noisy test logs

43-45: Avoid noisy test logs.

Same as above; don’t print from tests.

-    for output in llm.generate(["Hello, world!"]):
-        print(output)
+    for _ in llm.generate(["Hello, world!"]):
+        pass  # consume generator; avoid noisy test logs

60-61: Wrap the long assert message to satisfy E501 (<=120 chars).

Keeps the message readable and within the line-length limit.

-        assert "Using LLM with PyTorch backend" in pytorch_output, f"Expected 'pytorch' in logs, got: {pytorch_output}"
+        assert "Using LLM with PyTorch backend" in pytorch_output, (
+            f"Expected 'pytorch' in logs, got: {pytorch_output}"
+        )

70-71: Wrap the long assert message to satisfy E501 (<=120 chars).

Same fix for the TensorRT case.

-        assert "Using LLM with TensorRT backend" in tensorrt_output, f"Expected 'tensorrt' in logs, got: {tensorrt_output}"
+        assert "Using LLM with TensorRT backend" in tensorrt_output, (
+            f"Expected 'tensorrt' in logs, got: {tensorrt_output}"
+        )
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 333b4a2 and c9ed461.

📒 Files selected for processing (4)
  • tensorrt_llm/llmapi/llm.py (2 hunks)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py (1 hunks)
  • tests/integration/defs/llmapi/test_llm_api_qa.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
  • tests/integration/test_lists/qa/llm_function_full.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🪛 Ruff (0.12.2)
tests/integration/defs/llmapi/test_llm_api_qa.py

12-13: One-line docstring should fit on one line

Reformat to one line

(D200)


12-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


69-69: Line too long (123 > 120)

(E501)

🔇 Additional comments (1)
tests/integration/defs/llmapi/test_llm_api_qa.py (1)

16-16: Confirm whether llm_root is required.

Each test includes llm_root but doesn’t use it directly. If it’s only for side-effects (env setup), keep it; otherwise remove to reduce noise.

Also applies to: 30-30, 46-46

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14669 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14669 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #39 completed with status: 'FAILURE'

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
tests/integration/defs/llmapi/test_llm_api_qa.py (1)

46-71: Logging test: remove debug print, use fixture + pathlib, and wrap long asserts

  • Build paths with pathlib.
  • Drop noisy print.
  • Fix long-line E501 and make TRT assertion robust (accepts TensorRT or AutoDeploy message).
-    def test_llm_args_logging(self, llm_root, llm_venv):
+    def test_llm_args_logging(self, llm_root, llm_venv, llm_models_root):
@@
-        script_path = os.path.join(os.path.dirname(__file__),
-                                   "_run_llmapi_llm.py")
-        print(f"script_path: {script_path}")
+        import pathlib
+        script_path = str(pathlib.Path(__file__).parent / "_run_llmapi_llm.py")
+        model_path = str(
+            pathlib.Path(llm_models_root) / "llama-models-v3" / "llama-v3-8b-instruct-hf"
+        )
@@
-        pytorch_output = venv_check_output(llm_venv, pytorch_cmd)
+        pytorch_output = venv_check_output(llm_venv, pytorch_cmd)
@@
-        assert "Using LLM with PyTorch backend" in pytorch_output, f"Expected 'pytorch' in logs, got: {pytorch_output}"
+        expected_pt = "Using LLM with PyTorch backend"
+        assert expected_pt in pytorch_output, f"Missing '{expected_pt}' in logs"
@@
-        tensorrt_output = venv_check_output(llm_venv, tensorrt_cmd)
+        tensorrt_output = venv_check_output(llm_venv, tensorrt_cmd)
@@
-        assert "Using LLM with TensorRT backend" in tensorrt_output, f"Expected 'tensorrt' in logs, got: {tensorrt_output}"
+        expected_trt = ("Using LLM with TensorRT backend",
+                        "Using LLM with AutoDeploy backend")
+        assert any(s in tensorrt_output for s in expected_trt), \
+            f"Missing any of {expected_trt} in logs"
♻️ Duplicate comments (3)
tests/integration/defs/llmapi/test_llm_api_qa.py (3)

6-8: Critical: avoid importing conftest and building globals from it

Importing conftest as a module is brittle and can break test collection. Also, computing model_path at module import time ties tests to environment state. Use the llm_models_root fixture and build the path inside each test with pathlib.

-from ..conftest import llm_models_root
-
-model_path = llm_models_root() + "/llama-models-v3/llama-v3-8b-instruct-hf"
+# model_path is derived inside each test from the llm_models_root fixture.

16-26: Default backend test: use fixture + pathlib and module-namespace imports

Aligns with repo guidelines and fixes cross-platform pathing.

-    def test_llm_args_type_default(self, llm_root, llm_venv):
-        # Keep the complete example code here
-        from tensorrt_llm.llmapi import LLM, KvCacheConfig, TorchLlmArgs
+    def test_llm_args_type_default(self, llm_root, llm_venv, llm_models_root):
+        # Keep the complete example code here
+        import pathlib
+        import tensorrt_llm.llmapi as llmapi
@@
-        kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.4)
-        llm = LLM(model=model_path, kv_cache_config=kv_cache_config)
+        model_path = str(
+            pathlib.Path(llm_models_root) / "llama-models-v3" / "llama-v3-8b-instruct-hf"
+        )
+        kv_cache_config = llmapi.KvCacheConfig(free_gpu_memory_fraction=0.4)
+        llm = llmapi.LLM(model=model_path, kv_cache_config=kv_cache_config)
@@
-        assert isinstance(llm.args, TorchLlmArgs)
+        assert isinstance(llm.args, llmapi.TorchLlmArgs)

30-41: TensorRT test: mirror fixture + pathlib and module-namespace patterns

Keep import namespaces and build model_path from fixture.

-    def test_llm_args_type_tensorrt(self, llm_root, llm_venv):
-        # Keep the complete example code here
-        from tensorrt_llm._tensorrt_engine import LLM
-        from tensorrt_llm.llmapi import KvCacheConfig, TrtLlmArgs
+    def test_llm_args_type_tensorrt(self, llm_root, llm_venv, llm_models_root):
+        # Keep the complete example code here
+        import pathlib
+        import tensorrt_llm._tensorrt_engine as trt_engine
+        import tensorrt_llm.llmapi as llmapi
@@
-        kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.4)
+        model_path = str(
+            pathlib.Path(llm_models_root) / "llama-models-v3" / "llama-v3-8b-instruct-hf"
+        )
+        kv_cache_config = llmapi.KvCacheConfig(free_gpu_memory_fraction=0.4)
@@
-        llm = LLM(model=model_path, kv_cache_config=kv_cache_config)
+        llm = trt_engine.LLM(model=model_path, kv_cache_config=kv_cache_config)
@@
-        assert isinstance(llm.args, TrtLlmArgs)
+        assert isinstance(llm.args, llmapi.TrtLlmArgs)
🧹 Nitpick comments (3)
tests/integration/defs/llmapi/test_llm_api_qa.py (3)

4-4: Follow guideline: keep module namespace when importing

Prefer importing the module namespace and referencing attributes to comply with project Python import style.

-from defs.common import venv_check_output
+import defs.common as common

Then update usages:

-        pytorch_output = venv_check_output(llm_venv, pytorch_cmd)
+        pytorch_output = common.venv_check_output(llm_venv, pytorch_cmd)
@@
-        tensorrt_output = venv_check_output(llm_venv, tensorrt_cmd)
+        tensorrt_output = common.venv_check_output(llm_venv, tensorrt_cmd)

Please verify the import path resolution in your test harness (sys.path) still supports import defs.common as common.


12-14: Docstring style: single-line with ending punctuation (D200, D415)

-    """
-    Check that the default backend is PyTorch for v1.0 breaking change
-    """
+    """Verify default backend is PyTorch for v1.0 breaking change."""

27-29: Drop prints in tests

Printing generation outputs creates noisy CI logs without assertions. Iterate to exercise the path without emitting output.

-        for output in llm.generate(["Hello, world!"]):
-            print(output)
+        for _ in llm.generate(["Hello, world!"]):
+            pass

Also applies to: 43-45

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9ed461 and 914df60.

📒 Files selected for processing (4)
  • tensorrt_llm/llmapi/llm.py (2 hunks)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py (1 hunks)
  • tests/integration/defs/llmapi/test_llm_api_qa.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
  • tests/integration/test_lists/qa/llm_function_full.txt
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-07-22T09:22:14.726Z
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-08T04:10:18.987Z
Learnt from: djns99
PR: NVIDIA/TensorRT-LLM#6728
File: cpp/tensorrt_llm/plugins/mixtureOfExperts/mixtureOfExpertsPlugin.cpp:966-966
Timestamp: 2025-08-08T04:10:18.987Z
Learning: TensorRT plugins currently don't support padding functionality, and TensorRT is not getting new features (in maintenance mode). This means that duplicating parameters like mExpertHiddenSize in function calls, even with TODO comments, can be acceptable as pragmatic solutions within these constraints.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-06T03:47:16.802Z
Learnt from: venkywonka
PR: NVIDIA/TensorRT-LLM#6650
File: tests/integration/test_lists/qa/llm_perf_cluster.yml:33-37
Timestamp: 2025-08-06T03:47:16.802Z
Learning: Ministral is a valid model name from Mistral AI, distinct from the regular Mistral models. In TensorRT-LLM test configurations, "ministral_8b" and "ministral_8b_fp8" are correct model identifiers and should not be changed to "mistral_8b".

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-08T05:06:31.537Z
Learnt from: sklevtsov-nvidia
PR: NVIDIA/TensorRT-LLM#3294
File: cpp/tensorrt_llm/cutlass_extensions/include/cutlass_extensions/epilogue/fusion/sm90_visitor_scatter.hpp:36-36
Timestamp: 2025-08-08T05:06:31.537Z
Learning: CUTLASS extension files (under cpp/tensorrt_llm/cutlass_extensions/) follow CUTLASS coding style conventions, including using #pragma once instead of TRTLLM_ prefixed header guards, even though they are .hpp files.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🪛 Ruff (0.12.2)
tests/integration/defs/llmapi/test_llm_api_qa.py

12-13: One-line docstring should fit on one line

Reformat to one line

(D200)


12-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


69-69: Line too long (123 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@Superjomn
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14701 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14701 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #44 completed with status: 'FAILURE'

Signed-off-by: Superjomn <[email protected]>
@Superjomn
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14707 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
tests/integration/defs/llmapi/test_llm_api_qa.py (3)

6-8: Do not import from conftest; compute model_path via fixture inside tests.

Importing conftest breaks pytest collection and computing model_path at module import-time couples discovery to environment. Pass llm_models_root as a fixture argument and build the path with pathlib inside each test.

Apply this removal locally; follow-up diffs below adjust each test:

-from ..conftest import llm_models_root
-
-model_path = llm_models_root() + "/llama-models-v3/llama-v3-8b-instruct-hf"

46-55: Logging test: inject fixture, drop debug print, and use pathlib for model_path.

This aligns with fixtures-first and cross-platform paths.

-    def test_llm_args_logging(self, llm_root, llm_venv):
+    def test_llm_args_logging(self, llm_root, llm_venv, llm_models_root):
         # It should print the backend in the log
-        script_path = os.path.join(os.path.dirname(__file__),
-                                   "_run_llmapi_llm.py")
-        print(f"script_path: {script_path}")
+        import pathlib
+        script_path = os.path.join(os.path.dirname(__file__), "_run_llmapi_llm.py")
+        model_path = str(
+            pathlib.Path(llm_models_root) / "llama-models-v3" / "llama-v3-8b-instruct-hf"
+        )

16-25: Use fixtures, pathlib, and module-namespace imports per guidelines.

  • Inject llm_models_root fixture.
  • Use pathlib for cross-platform paths.
  • Keep module namespace on imports and update type references.
-    def test_llm_args_type_default(self, llm_root, llm_venv):
+    def test_llm_args_type_default(self, llm_root, llm_venv, llm_models_root):
         # Keep the complete example code here
-        from tensorrt_llm.llmapi import LLM, KvCacheConfig, TorchLlmArgs
+        import pathlib
+        import tensorrt_llm.llmapi as llmapi
-
-        kv_cache_config = KvCacheConfig(free_gpu_memory_fraction=0.4)
-        llm = LLM(model=model_path, kv_cache_config=kv_cache_config)
+        model_path = str(
+            pathlib.Path(llm_models_root) / "llama-models-v3" / "llama-v3-8b-instruct-hf"
+        )
+        kv_cache_config = llmapi.KvCacheConfig(free_gpu_memory_fraction=0.4)
+        llm = llmapi.LLM(model=model_path, kv_cache_config=kv_cache_config)
@@
-        assert llm.args.backend == "pytorch"
-        assert isinstance(llm.args, TorchLlmArgs)
+        assert llm.args.backend == "pytorch"
+        assert isinstance(llm.args, llmapi.TorchLlmArgs)
🧹 Nitpick comments (4)
tests/integration/defs/llmapi/test_llm_api_qa.py (4)

12-14: Docstring style: one line and end with punctuation (D200, D415).

Make the class docstring a single line ending with a period.

-    """
-    Check that the default backend is PyTorch for v1.0 breaking change
-    """
+    """Check that the default backend is PyTorch for the v1.0 breaking change."""

27-29: Avoid printing in tests; keep execution minimal.

Reduce runtime and log noise; just touch generate once.

-        for output in llm.generate(["Hello, world!"]):
-            print(output)
+        for _ in llm.generate(["Hello, world!"]):
+            break  # exercise the path without spamming CI logs

43-45: Remove prints to keep tests clean.

Same rationale as the PyTorch test.

-        for output in llm.generate(["Hello, world!"]):
-            print(output)
+        for _ in llm.generate(["Hello, world!"]):
+            break

59-61: Break long assertion and relax message match to be less brittle (E501).

Keep under 120 chars and avoid hard-coding the full sentence; the exact wording can drift.

-        assert "Using LLM with PyTorch backend" in pytorch_output, f"Expected 'pytorch' in logs, got: {pytorch_output}"
+        assert (
+            "PyTorch backend" in pytorch_output
+        ), f"Expected 'pytorch' in logs, got: {pytorch_output}"
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 914df60 and eb6d588.

📒 Files selected for processing (4)
  • tensorrt_llm/llmapi/llm.py (2 hunks)
  • tests/integration/defs/llmapi/_run_llmapi_llm.py (1 hunks)
  • tests/integration/defs/llmapi/test_llm_api_qa.py (1 hunks)
  • tests/integration/test_lists/qa/llm_function_full.txt (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration/test_lists/qa/llm_function_full.txt
  • tensorrt_llm/llmapi/llm.py
  • tests/integration/defs/llmapi/_run_llmapi_llm.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the class docstring.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-06T13:58:07.506Z
Learnt from: galagam
PR: NVIDIA/TensorRT-LLM#6487
File: tests/unittest/_torch/auto_deploy/unit/singlegpu/test_ad_trtllm_bench.py:1-12
Timestamp: 2025-08-06T13:58:07.506Z
Learning: In TensorRT-LLM, test files (files under tests/ directories) do not require NVIDIA copyright headers, unlike production source code files. Test files typically start directly with imports, docstrings, or code.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
📚 Learning: 2025-08-01T15:14:45.673Z
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tests/integration/defs/llmapi/test_llm_api_qa.py
🪛 Ruff (0.12.2)
tests/integration/defs/llmapi/test_llm_api_qa.py

12-13: One-line docstring should fit on one line

Reformat to one line

(D200)


12-13: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)


69-69: Line too long (123 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14707 [ run ] completed with state SUCCESS
/LLM/release-1.0/L0_MergeRequest_PR pipeline #46 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@Superjomn Superjomn enabled auto-merge (squash) August 11, 2025 00:02
@Superjomn Superjomn merged commit 21e4f51 into NVIDIA:release/1.0 Aug 11, 2025
4 checks passed
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 13, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 13, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 17, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 18, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 19, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 19, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 20, 2025
dominicshanshan pushed a commit to dominicshanshan/TensorRT-LLM that referenced this pull request Aug 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants