Skip to content

Conversation

@ruodil
Copy link
Collaborator

@ruodil ruodil commented Jul 23, 2025

Summary by CodeRabbit

  • Tests
    • Added new performance and sanity test cases for multiple models, including new variants and quantization types, across various GPU configurations.
    • Updated model paths and added new model entries to expand test coverage.
    • Improved organization and clarity of test lists by grouping entries by backend and model version, and adding descriptive comments.
    • Removed outdated or duplicate test cases for better maintainability.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 23, 2025

📝 Walkthrough

"""

Walkthrough

New model variants and their paths were added to the MODEL_PATH_DICT in the test performance Python file, and one existing model path was updated. Several YAML test list files were reorganized and expanded with new test entries, backend clarifications, and comments, but no functional or logic changes were made. Additionally, some model pattern strings in the PyTorch model config were updated and extended.

Changes

File(s) Change Summary
Performance test definitions
tests/integration/defs/perf/test_perf.py
Added new entries and updated one path in MODEL_PATH_DICT; modified get_trtllm_bench_command to use full path for extra LLM API config YAML.
QA test list configurations
tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml
Removed two tests for qwen_14b_chat (4+ GPU); added multiple new tests (mostly 8+ GPU) for various models with comments.
tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml Added comments to clarify TRT vs PyTorch backend usage; replaced one test entry; reordered some tests.
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Reorganized, grouped, and expanded test entries; added backend clarification comments; reinstated some tests; added multi-GPU cases.
PyTorch model configuration
tests/integration/defs/perf/pytorch_model_config.py
Updated and extended model pattern strings for DeepSeek R1 and Llama variants; removed quantization substrings from patterns; no logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • LarryXFly
  • Shixiaowei02
  • yilin-void

Poem

🐇
In the warren where the models grow,
New paths and tests now overflow.
Llama Scouts and Mavericks leap,
YAML lists get tidy and neat.
Backend notes and clusters align,
The rabbit approves—these changes are fine!
✨🐰
"""

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55d68d2 and e31fb67.

📒 Files selected for processing (4)
  • tests/integration/defs/perf/pytorch_model_config.py (4 hunks)
  • tests/integration/defs/perf/test_perf.py (3 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (1 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (15 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml
  • tests/integration/defs/perf/pytorch_model_config.py
  • tests/integration/test_lists/qa/trt_llm_release_perf_test.yml
  • tests/integration/defs/perf/test_perf.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ed62a06 and a959db5.

📒 Files selected for processing (4)
  • tests/integration/defs/perf/test_perf.py (2 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (1 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (8 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (15 hunks)
🧠 Learnings (3)
tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (1)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (1)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

🧰 Additional context used
🧠 Learnings (3)
tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (1)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (1)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)

Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.

🔇 Additional comments (21)
tests/integration/defs/perf/test_perf.py (3)

58-59: LGTM! New model path added correctly.

The addition of the llama_v3.1_405b_instruct_fp8 model path is properly structured and follows the established naming convention. The path structure is consistent with other LLaMA v3.1 models.


76-79: LGTM! FP8 and FP4 model variants added appropriately.

The additions of llama_v4_scout_17b_16e_instruct_fp8 and llama_v4_scout_17b_16e_instruct_fp4 model paths are well-structured and consistent with the existing pattern for LLaMA v4 Scout models. The paths follow the established directory hierarchy.


83-83: LGTM! Model path update reflects organizational structure.

The path update for llama_v4_maverick_17b_128e_instruct_fp8 to include the nvidia/ subdirectory appears to reflect a change in the model repository organization. This is consistent with the AI summary indicating an organizational update.

tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (6)

35-40: Good organization with backend distinction comments.

The addition of comments clearly distinguishing between TRT backend and PyTorch backend tests improves test configuration readability and maintainability. This makes it easier for developers to understand the test structure at a glance.


64-71: Consistent comment structure enhances clarity.

The continued use of backend distinction comments for the llama_v3.1_8b section maintains consistency with the previous section and further improves the organization of the test configuration.


86-95: Clear FP8 test organization.

The comments for llama_v3.1_8b_instruct_fp8 section provide good organization for FP8-specific tests, clearly separating TRT and PyTorch backend implementations. This helps maintainers understand the test coverage for different quantization approaches.


113-122: Consistent multi-GPU test organization.

The backend distinction comments for the 2+ GPU tests maintain the established pattern and help organize tests by backend type, making it easier to understand the multi-GPU test coverage.


143-143: Model variant change aligns with test organization.

The change from mixtral_8x7b_v0.1 to mixtral_8x7b_v0.1_instruct_fp8 with PyTorch float8 backend is consistent with the PR's focus on organizing tests by model variants and backend types. This provides better coverage of quantized model variants.


176-181: Excellent consistency in multi-GPU test comments.

The backend distinction comments for llama_v3.1_70b and llama_v3.3_70b_instruct_fp8 sections across different GPU count conditions (4+, 8+) maintain excellent consistency. This systematic approach makes the test configuration highly readable and maintainable.

Also applies to: 216-222, 243-250

tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (3)

56-96: Comprehensive test coverage expansion for cluster testing.

The addition of extensive test cases for various model variants (llama_v3.3_nemotron_super_49b, llama_v3.3_70b_instruct_fp4, llama_v3.1_405b_instruct_fp4, llama_v4_scout_17b_16e_instruct_fp4, deepseek_r1_fp8, deepseek_r1_nvfp4) provides excellent coverage for cluster testing scenarios. The tests include:

  • Different quantization formats (fp4, fp8, nvfp4)
  • Various input/output length combinations
  • Multiple tensor parallelism configurations
  • Both regular and streaming modes
  • Latency and throughput focused tests (with helpful comments)

The organization with model-specific comment headers improves readability and maintenance.


80-81: Well-documented performance test scenarios.

The inline comments distinguishing "min latency test" and "max throughput test" provide valuable context for understanding the purpose of different test configurations. This documentation helps maintainers understand the performance testing strategy and expected behavior.

Also applies to: 84-87


86-86: Appropriate timeout configuration for resource-intensive tests.

The TIMEOUT (120) specification for the max throughput test with nvfp4 quantization recognizes that this configuration may require extended execution time due to the computational complexity. This prevents premature test failures while ensuring reasonable bounds.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (9)

26-35: Excellent organizational improvements with clear model and backend grouping.

The addition of comments like #llama_v3.1_8b_instruct, #trt backend, and #pytorch backend significantly improves the readability and maintainability of this test configuration file. This makes it much easier to understand which tests belong to which model variants and backend types.


86-91: Good addition of PyTorch backend tests for phi_4_mini_instruct.

The new PyTorch backend test cases provide good coverage for the phi_4_mini_instruct model with various input/output length combinations, complementing the existing TensorRT backend tests.


167-179: Well-structured FP8 test organization with comprehensive backend coverage.

The reorganization clearly separates TensorRT and PyTorch backend tests for llama_v3.1_8b models, providing comprehensive coverage of FP8 quantization scenarios across different input/output length configurations.


218-244: Comprehensive multi-GPU test coverage with clear model grouping.

The 2-GPU test section is well-organized with clear comments separating different models and backends. The test cases cover various scenarios including tensor parallelism, pipeline parallelism, and different batch configurations.


347-353: Good addition of 4-GPU tests for llama_v3.1_70b.

The new 4-GPU test cases for llama_v3.1_70b provide good coverage with both streaming and standard inference modes across different input/output configurations.


372-378: Excellent PyTorch backend test coverage for llama_v3.3_70b_instruct_fp8.

The new PyTorch backend tests provide comprehensive coverage for the FP8 quantized model with various input/output length combinations and multi-GPU configurations.


466-489: Comprehensive test coverage for new LLaMA v4 model variants.

The addition of test cases for llama_v3.1_405b_fp8, llama_v4_maverick_17b_128e_instruct_fp8, and llama_v4_scout_17b_16e_instruct_fp8 provides good coverage across different model sizes and configurations. The tests include various batch sizes, context lengths, and GPU configurations.


570-578: Excellent addition of FP4 quantization tests for GB chip variants.

The new test cases for llama_v3.3_70b_instruct_fp4 and deepseek_v3_lite_nvfp4 provide good coverage for FP4 quantization scenarios on GB chip architectures. The tests include both standard and streaming modes with various configurations.


107-112: Double-check stability of reinstated cppmanager-exe performance tests

I reviewed tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (lines 107–112) for the llama_v3_8b_instruct-cppmanager-exe entries and found no historical OOM or MPI abort comments. Since these tests were previously disabled and are now always enabled, please manually verify that they:

  • Pass reliably across all supported GPU architectures
  • Don’t exhibit intermittent OOMs or MPI aborts under realistic load

If you observe instability, consider re-adding conditional skips or resource guards around these entries.

@ruodil ruodil force-pushed the user/ruodil/org_cases branch from a959db5 to ebde2b6 Compare July 25, 2025 07:31
@ruodil ruodil changed the title test: organize perf cases and add cases in perflab test: organize perf cases and add missed perflab cases in qa test list Jul 25, 2025
@ruodil ruodil changed the title test: organize perf cases and add missed perflab cases in qa test list test: organize perf cases and add missing perflab cases in qa test list Jul 25, 2025
@ruodil ruodil force-pushed the user/ruodil/org_cases branch from ebde2b6 to 385872a Compare July 25, 2025 08:19
@ruodil ruodil force-pushed the user/ruodil/org_cases branch from 3a18c15 to 55d68d2 Compare July 28, 2025 02:32
@ruodil ruodil force-pushed the user/ruodil/org_cases branch from 55d68d2 to e31fb67 Compare July 28, 2025 03:03
@coderabbitai coderabbitai bot requested a review from Shixiaowei02 July 28, 2025 03:04
@LarryXFly LarryXFly merged commit 03632a6 into NVIDIA:main Jul 28, 2025
2 checks passed
NVShreyas pushed a commit to NVShreyas/TensorRT-LLM that referenced this pull request Jul 28, 2025
Ransiki pushed a commit to Ransiki/TensorRT-LLM that referenced this pull request Jul 29, 2025
lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants