Skip to content

Conversation

ruodil
Copy link
Collaborator

@ruodil ruodil commented Jul 8, 2025

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

  • New Features

    • Added new models and LoRA configurations for performance testing, including support for multimodal and FP8 quantized models.
    • Expanded the test suite with additional benchmark scenarios for newly supported models.
  • Tests

    • Introduced new test cases for multimodal and quantized models, as well as LoRA-enabled variants.
    • Updated test configurations for improved coverage and accuracy.

@ruodil ruodil force-pushed the user/ruodil/add_cases branch from 0d863bc to 8f70aac Compare July 8, 2025 10:11
@ruodil ruodil requested review from LarryXFly and venkywonka July 8, 2025 10:13
@ruodil ruodil force-pushed the user/ruodil/add_cases branch 2 times, most recently from e25f4c0 to 9e38c2c Compare July 9, 2025 05:42
@ruodil ruodil force-pushed the user/ruodil/add_cases branch from 9e38c2c to 7ff755a Compare July 17, 2025 01:55
Copy link
Contributor

coderabbitai bot commented Jul 17, 2025

"""

Walkthrough

The changes introduce new model entries and LoRA configurations for multimodal and FP8-quantized models in the performance test suite. Conditional logic is added for configuring LoRA modules for specific models, and new benchmark tests are appended to the test list YAML to cover these models and configurations.

Changes

File(s) Change Summary
tests/integration/defs/perf/pytorch_model_config.py Added conditional LoRA configuration for models labeled "phi_4_multimodal_instruct", including module mappings.
tests/integration/defs/perf/test_perf.py Added new model and LoRA paths for multimodal and FP8 models in MODEL_PATH_DICT and LORA_MODEL_PATH. Modified benchmark command to pass LoRA directories for PyTorch backend.
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Appended new performance test entries for multimodal, LoRA, and FP8 models; removed a redundant parameter.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner
    participant ModelConfig
    participant LoRAConfig

    TestRunner->>ModelConfig: Request config for model (e.g., phi_4_multimodal_instruct)
    ModelConfig->>LoRAConfig: Check if model label matches "phi_4_multimodal_instruct"
    alt If match
        LoRAConfig->>ModelConfig: Add lora_target_modules and module mappings
    end
    ModelConfig-->>TestRunner: Return merged model configuration
Loading

Suggested reviewers

  • yilin-void
  • achartier

Poem

In the warren of code, new models appear,
Multimodal and FP8, the future is here!
LoRA paths mapped with a hop and a bound,
Benchmarks expanded, new tests all around.
🐇 With configs aligned and carrots in sight,
This bunny declares: "The tests run just right!"

"""


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f72137 and a16d8c0.

📒 Files selected for processing (3)
  • tests/integration/defs/perf/pytorch_model_config.py (1 hunks)
  • tests/integration/defs/perf/test_perf.py (3 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/integration/defs/perf/pytorch_model_config.py
  • tests/integration/test_lists/qa/trt_llm_release_perf_test.yml
  • tests/integration/defs/perf/test_perf.py
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)

173-174: Extreme FP8 workloads – risk of exceeding memory budget

bielik_11b_v2.2_instruct_fp8 is scheduled at up to 2 000 × 2 000 tokens with con:250.
Although FP8 halves KV-cache size, 11-B params at that sequence length still exceed 80 GB on H100 in practice. Consider:

-...input_output_len:2000,2000-con:250
+...input_output_len:2000,2000-reqs:8-con:1   # safer default

or gate the test behind gpu_memory.gt:160000.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe070a0 and 7ff755a.

📒 Files selected for processing (3)
  • tests/integration/defs/perf/pytorch_model_config.py (1 hunks)
  • tests/integration/defs/perf/test_perf.py (2 hunks)
  • tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/integration/defs/perf/pytorch_model_config.py (1)
tensorrt_llm/_torch/models/modeling_phi4mm.py (1)
  • lora_config (242-262)
🔇 Additional comments (5)
tests/integration/defs/perf/pytorch_model_config.py (1)

189-199: Validate the max_lora_rank for phi_4_multimodal_instruct

There’s a mismatch in the max_lora_rank value across configurations:

  • tests/integration/defs/perf/pytorch_model_config.py (L189-199) sets
    max_lora_rank = 64
  • tensorrt_llm/_torch/models/modeling_phi4mm.py uses
    max_lora_rank = 320 # Max rank for Phi4MM.
  • examples/llm-api/llm_multilora.py also uses
    max_lora_rank = 64 in its sample call.

Please confirm whether the lower rank (64) is intentional for faster performance testing, or if it should be aligned with the reference implementation’s value (320).

tests/integration/defs/perf/test_perf.py (2)

117-121: LGTM! Model path additions look correct.

The new model entries follow the established naming conventions and directory structure patterns. The multimodal variants appropriately share the same base path, and the FP8 quantized variant is clearly differentiated.


158-161: LGTM! LoRA path additions are consistent with the multimodal model structure.

The LoRA paths correctly point to the vision-lora and speech-lora directories for the respective variants, matching the expected structure for multimodal models as shown in the reference implementation.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (2)

102-105: LoRA support for image/audio variants is present
Entries for phi_4_multimodal_instruct_image and phi_4_multimodal_instruct_audio are defined in both dictionaries in tests/integration/defs/perf/test_perf.py:

  • MODEL_PATH_DICT: lines 117–118
  • LORA_MODEL_PATH: lines 157–159

No further action required.


81-84: No action needed: dotted model keys are safe for path look-ups

  • The MODEL_PATH_DICT mapping correctly defines the key "bielik_11b_v2.2_instruct" and its FP8 sibling.
  • All filesystem paths are constructed with os.path.join(llm_models_root(), MODEL_PATH_DICT[...]), so the dot in the key never becomes a separator.
  • The mapping’s value (“Bielik-11B-v2.2-Instruct”) is used as the directory name; dots are valid characters in file and directory names on all target platforms.

@ruodil ruodil force-pushed the user/ruodil/add_cases branch from 7ff755a to fe13310 Compare July 17, 2025 02:02
@ruodil ruodil force-pushed the user/ruodil/add_cases branch 2 times, most recently from 5f72137 to a16d8c0 Compare July 18, 2025 05:34
@LarryXFly LarryXFly merged commit 6a3c9f8 into NVIDIA:main Jul 21, 2025
2 checks passed
reasonsolo pushed a commit to reasonsolo/TensorRT-LLM that referenced this pull request Jul 21, 2025
timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025
NVShreyas pushed a commit to NVShreyas/TensorRT-LLM that referenced this pull request Jul 28, 2025
Ransiki pushed a commit to Ransiki/TensorRT-LLM that referenced this pull request Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants