-
Notifications
You must be signed in to change notification settings - Fork 1.9k
test: organize perf cases and add missing perflab cases in qa test list #6283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 Walkthrough""" WalkthroughNew model variants and their paths were added to the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
tests/integration/defs/perf/test_perf.py(2 hunks)tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml(1 hunks)tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml(8 hunks)tests/integration/test_lists/qa/trt_llm_release_perf_test.yml(15 hunks)
🧠 Learnings (3)
tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (1)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (1)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
🧰 Additional context used
🧠 Learnings (3)
tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (1)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (1)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)
Learnt from: yiqingy0
PR: #5198
File: jenkins/mergeWaiveList.py:0-0
Timestamp: 2025-07-22T08:33:49.076Z
Learning: In the TensorRT-LLM waive list merging system, removed lines are always located at the end of the merge waive lists, which is why the mergeWaiveList.py script uses reverse traversal - it's an optimization for this specific domain constraint.
🔇 Additional comments (21)
tests/integration/defs/perf/test_perf.py (3)
58-59: LGTM! New model path added correctly.The addition of the
llama_v3.1_405b_instruct_fp8model path is properly structured and follows the established naming convention. The path structure is consistent with other LLaMA v3.1 models.
76-79: LGTM! FP8 and FP4 model variants added appropriately.The additions of
llama_v4_scout_17b_16e_instruct_fp8andllama_v4_scout_17b_16e_instruct_fp4model paths are well-structured and consistent with the existing pattern for LLaMA v4 Scout models. The paths follow the established directory hierarchy.
83-83: LGTM! Model path update reflects organizational structure.The path update for
llama_v4_maverick_17b_128e_instruct_fp8to include thenvidia/subdirectory appears to reflect a change in the model repository organization. This is consistent with the AI summary indicating an organizational update.tests/integration/test_lists/qa/trt_llm_release_perf_sanity_test.yml (6)
35-40: Good organization with backend distinction comments.The addition of comments clearly distinguishing between TRT backend and PyTorch backend tests improves test configuration readability and maintainability. This makes it easier for developers to understand the test structure at a glance.
64-71: Consistent comment structure enhances clarity.The continued use of backend distinction comments for the
llama_v3.1_8bsection maintains consistency with the previous section and further improves the organization of the test configuration.
86-95: Clear FP8 test organization.The comments for
llama_v3.1_8b_instruct_fp8section provide good organization for FP8-specific tests, clearly separating TRT and PyTorch backend implementations. This helps maintainers understand the test coverage for different quantization approaches.
113-122: Consistent multi-GPU test organization.The backend distinction comments for the 2+ GPU tests maintain the established pattern and help organize tests by backend type, making it easier to understand the multi-GPU test coverage.
143-143: Model variant change aligns with test organization.The change from
mixtral_8x7b_v0.1tomixtral_8x7b_v0.1_instruct_fp8with PyTorch float8 backend is consistent with the PR's focus on organizing tests by model variants and backend types. This provides better coverage of quantized model variants.
176-181: Excellent consistency in multi-GPU test comments.The backend distinction comments for
llama_v3.1_70bandllama_v3.3_70b_instruct_fp8sections across different GPU count conditions (4+, 8+) maintain excellent consistency. This systematic approach makes the test configuration highly readable and maintainable.Also applies to: 216-222, 243-250
tests/integration/test_lists/qa/trt_llm_release_perf_cluster_test.yml (3)
56-96: Comprehensive test coverage expansion for cluster testing.The addition of extensive test cases for various model variants (llama_v3.3_nemotron_super_49b, llama_v3.3_70b_instruct_fp4, llama_v3.1_405b_instruct_fp4, llama_v4_scout_17b_16e_instruct_fp4, deepseek_r1_fp8, deepseek_r1_nvfp4) provides excellent coverage for cluster testing scenarios. The tests include:
- Different quantization formats (fp4, fp8, nvfp4)
- Various input/output length combinations
- Multiple tensor parallelism configurations
- Both regular and streaming modes
- Latency and throughput focused tests (with helpful comments)
The organization with model-specific comment headers improves readability and maintenance.
80-81: Well-documented performance test scenarios.The inline comments distinguishing "min latency test" and "max throughput test" provide valuable context for understanding the purpose of different test configurations. This documentation helps maintainers understand the performance testing strategy and expected behavior.
Also applies to: 84-87
86-86: Appropriate timeout configuration for resource-intensive tests.The
TIMEOUT (120)specification for the max throughput test with nvfp4 quantization recognizes that this configuration may require extended execution time due to the computational complexity. This prevents premature test failures while ensuring reasonable bounds.tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (9)
26-35: Excellent organizational improvements with clear model and backend grouping.The addition of comments like
#llama_v3.1_8b_instruct,#trt backend, and#pytorch backendsignificantly improves the readability and maintainability of this test configuration file. This makes it much easier to understand which tests belong to which model variants and backend types.
86-91: Good addition of PyTorch backend tests for phi_4_mini_instruct.The new PyTorch backend test cases provide good coverage for the phi_4_mini_instruct model with various input/output length combinations, complementing the existing TensorRT backend tests.
167-179: Well-structured FP8 test organization with comprehensive backend coverage.The reorganization clearly separates TensorRT and PyTorch backend tests for llama_v3.1_8b models, providing comprehensive coverage of FP8 quantization scenarios across different input/output length configurations.
218-244: Comprehensive multi-GPU test coverage with clear model grouping.The 2-GPU test section is well-organized with clear comments separating different models and backends. The test cases cover various scenarios including tensor parallelism, pipeline parallelism, and different batch configurations.
347-353: Good addition of 4-GPU tests for llama_v3.1_70b.The new 4-GPU test cases for llama_v3.1_70b provide good coverage with both streaming and standard inference modes across different input/output configurations.
372-378: Excellent PyTorch backend test coverage for llama_v3.3_70b_instruct_fp8.The new PyTorch backend tests provide comprehensive coverage for the FP8 quantized model with various input/output length combinations and multi-GPU configurations.
466-489: Comprehensive test coverage for new LLaMA v4 model variants.The addition of test cases for
llama_v3.1_405b_fp8,llama_v4_maverick_17b_128e_instruct_fp8, andllama_v4_scout_17b_16e_instruct_fp8provides good coverage across different model sizes and configurations. The tests include various batch sizes, context lengths, and GPU configurations.
570-578: Excellent addition of FP4 quantization tests for GB chip variants.The new test cases for
llama_v3.3_70b_instruct_fp4anddeepseek_v3_lite_nvfp4provide good coverage for FP4 quantization scenarios on GB chip architectures. The tests include both standard and streaming modes with various configurations.
107-112: Double-check stability of reinstated cppmanager-exe performance testsI reviewed tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (lines 107–112) for the
llama_v3_8b_instruct-cppmanager-exeentries and found no historical OOM or MPI abort comments. Since these tests were previously disabled and are now always enabled, please manually verify that they:
- Pass reliably across all supported GPU architectures
- Don’t exhibit intermittent OOMs or MPI aborts under realistic load
If you observe instability, consider re-adding conditional skips or resource guards around these entries.
a959db5 to
ebde2b6
Compare
ebde2b6 to
385872a
Compare
3a18c15 to
55d68d2
Compare
Signed-off-by: ruodil <[email protected]>
55d68d2 to
e31fb67
Compare
Signed-off-by: ruodil <[email protected]>
…st (NVIDIA#6283) Signed-off-by: ruodil <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
…st (NVIDIA#6283) Signed-off-by: ruodil <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
…st (NVIDIA#6283) Signed-off-by: ruodil <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
Summary by CodeRabbit
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.