test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test #5826

ruodil · 2025-07-08T06:38:42Z

PR title

Please write the PR title by following template:

[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>

For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like

[TRTLLM-1000][feat] Support a new feature about cache manager

Description

Please explain the issue and the solution in short.

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

New Features
- Added new models and LoRA configurations for performance testing, including support for multimodal and FP8 quantized models.
- Expanded the test suite with additional benchmark scenarios for newly supported models.
Tests
- Introduced new test cases for multimodal and quantized models, as well as LoRA-enabled variants.
- Updated test configurations for improved coverage and accuracy.

coderabbitai · 2025-07-17T01:55:11Z

"""

Walkthrough

The changes introduce new model entries and LoRA configurations for multimodal and FP8-quantized models in the performance test suite. Conditional logic is added for configuring LoRA modules for specific models, and new benchmark tests are appended to the test list YAML to cover these models and configurations.

Changes

File(s)	Change Summary
tests/integration/defs/perf/pytorch_model_config.py	Added conditional LoRA configuration for models labeled "phi_4_multimodal_instruct", including module mappings.
tests/integration/defs/perf/test_perf.py	Added new model and LoRA paths for multimodal and FP8 models in `MODEL_PATH_DICT` and `LORA_MODEL_PATH`. Modified benchmark command to pass LoRA directories for PyTorch backend.
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml	Appended new performance test entries for multimodal, LoRA, and FP8 models; removed a redundant parameter.

Sequence Diagram(s)

sequenceDiagram
    participant TestRunner
    participant ModelConfig
    participant LoRAConfig

    TestRunner->>ModelConfig: Request config for model (e.g., phi_4_multimodal_instruct)
    ModelConfig->>LoRAConfig: Check if model label matches "phi_4_multimodal_instruct"
    alt If match
        LoRAConfig->>ModelConfig: Add lora_target_modules and module mappings
    end
    ModelConfig-->>TestRunner: Return merged model configuration

Suggested reviewers

yilin-void
achartier

Poem

In the warren of code, new models appear,
Multimodal and FP8, the future is here!
LoRA paths mapped with a hop and a bound,
Benchmarks expanded, new tests all around.
🐇 With configs aligned and carrots in sight,
This bunny declares: "The tests run just right!"

"""

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f72137 and a16d8c0.

📒 Files selected for processing (3)

tests/integration/defs/perf/pytorch_model_config.py (1 hunks)
tests/integration/defs/perf/test_perf.py (3 hunks)
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tests/integration/defs/perf/pytorch_model_config.py
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml
tests/integration/defs/perf/test_perf.py

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)
173-174: Extreme FP8 workloads – risk of exceeding memory budget

bielik_11b_v2.2_instruct_fp8 is scheduled at up to 2 000 × 2 000 tokens with con:250.
Although FP8 halves KV-cache size, 11-B params at that sequence length still exceed 80 GB on H100 in practice. Consider:
-...input_output_len:2000,2000-con:250
+...input_output_len:2000,2000-reqs:8-con:1   # safer default
or gate the test behind gpu_memory.gt:160000.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe070a0 and 7ff755a.

📒 Files selected for processing (3)

tests/integration/defs/perf/pytorch_model_config.py (1 hunks)
tests/integration/defs/perf/test_perf.py (2 hunks)
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (3 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

tests/integration/defs/perf/pytorch_model_config.py (1)

tensorrt_llm/_torch/models/modeling_phi4mm.py (1)

lora_config (242-262)

🔇 Additional comments (5)

tests/integration/defs/perf/pytorch_model_config.py (1)

189-199: Validate the max_lora_rank for phi_4_multimodal_instruct

There’s a mismatch in the max_lora_rank value across configurations:

tests/integration/defs/perf/pytorch_model_config.py (L189-199) sets
max_lora_rank = 64

tensorrt_llm/_torch/models/modeling_phi4mm.py uses
max_lora_rank = 320 # Max rank for Phi4MM.

examples/llm-api/llm_multilora.py also uses
max_lora_rank = 64 in its sample call.

Please confirm whether the lower rank (64) is intentional for faster performance testing, or if it should be aligned with the reference implementation’s value (320).

tests/integration/defs/perf/test_perf.py (2)

117-121: LGTM! Model path additions look correct.

The new model entries follow the established naming conventions and directory structure patterns. The multimodal variants appropriately share the same base path, and the FP8 quantized variant is clearly differentiated.

158-161: LGTM! LoRA path additions are consistent with the multimodal model structure.

The LoRA paths correctly point to the vision-lora and speech-lora directories for the respective variants, matching the expected structure for multimodal models as shown in the reference implementation.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (2)

102-105: LoRA support for image/audio variants is present
Entries for phi_4_multimodal_instruct_image and phi_4_multimodal_instruct_audio are defined in both dictionaries in tests/integration/defs/perf/test_perf.py:

MODEL_PATH_DICT: lines 117–118

LORA_MODEL_PATH: lines 157–159

No further action required.

81-84: No action needed: dotted model keys are safe for path look-ups

The MODEL_PATH_DICT mapping correctly defines the key "bielik_11b_v2.2_instruct" and its FP8 sibling.

All filesystem paths are constructed with os.path.join(llm_models_root(), MODEL_PATH_DICT[...]), so the dot in the key never becomes a separator.

The mapping’s value (“Bielik-11B-v2.2-Instruct”) is used as the directory name; dots are valid characters in file and directory names on all target platforms.

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml

Signed-off-by: ruodil <[email protected]>

…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]>

…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>

…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>

ruodil force-pushed the user/ruodil/add_cases branch from 0d863bc to 8f70aac Compare July 8, 2025 10:11

ruodil requested review from LarryXFly and venkywonka July 8, 2025 10:13

ruodil force-pushed the user/ruodil/add_cases branch 2 times, most recently from e25f4c0 to 9e38c2c Compare July 9, 2025 05:42

ruodil force-pushed the user/ruodil/add_cases branch from 9e38c2c to 7ff755a Compare July 17, 2025 01:55

coderabbitai bot reviewed Jul 17, 2025

View reviewed changes

tests/integration/test_lists/qa/trt_llm_release_perf_test.yml Show resolved Hide resolved

ruodil force-pushed the user/ruodil/add_cases branch from 7ff755a to fe13310 Compare July 17, 2025 02:02

venkywonka approved these changes Jul 17, 2025

View reviewed changes

ruodil force-pushed the user/ruodil/add_cases branch 2 times, most recently from 5f72137 to a16d8c0 Compare July 18, 2025 05:34

ruodil and others added 3 commits July 18, 2025 05:39

add phi-4 multimodel and bielik-11b-v2.2 models

146544e

Signed-off-by: ruodil <[email protected]>

add lora configs

a16d8c0

Signed-off-by: ruodil <[email protected]>

Merge branch 'main' into user/ruodil/add_cases

bdf7377

LarryXFly approved these changes Jul 21, 2025

View reviewed changes

LarryXFly merged commit 6a3c9f8 into NVIDIA:main Jul 21, 2025
2 checks passed

reasonsolo pushed a commit to reasonsolo/TensorRT-LLM that referenced this pull request Jul 21, 2025

test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (N…

4ca8e28

…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]>

timlee0212 pushed a commit to timlee0212/TensorRT-LLM that referenced this pull request Jul 21, 2025

test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test (N…

9890006

…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]>

coderabbitai bot mentioned this pull request Jul 23, 2025

test: organize perf cases and add missing perflab cases in qa test list #6283

Merged

This was referenced Jul 30, 2025

[None][chore] Add readme for perf test #6443

Merged

[TRTLLM-6764][test] add new feature cases in cluster(B200/GB200) and sanity test #6650

Merged

[None][test] align kv_frac in perf test with perflab and add more cases for 4 gpus GB200 #6632

Merged

This was referenced Aug 6, 2025

[None][test] remove trt backend cases in release perf test and move NIM cases to llm_perf_nim.yml #6662

Merged

[TRTLLM-5252][test] add for mistral_small_3.1_24b perf test #6685

Merged

[TRTLLM-6825][fix] Update lora for phi4-mm #6817

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test #5826

test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test #5826

Uh oh!

ruodil commented Jul 8, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 17, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test #5826

test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test #5826

Uh oh!

Conversation

ruodil commented Jul 8, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR title

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ruodil commented Jul 8, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 17, 2025 •

edited

Loading