-
Notifications
You must be signed in to change notification settings - Fork 1.8k
test: add phi-4 multimodel and bielik-11b-v2.2 models for perf test #5826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0d863bc
to
8f70aac
Compare
e25f4c0
to
9e38c2c
Compare
9e38c2c
to
7ff755a
Compare
""" WalkthroughThe changes introduce new model entries and LoRA configurations for multimodal and FP8-quantized models in the performance test suite. Conditional logic is added for configuring LoRA modules for specific models, and new benchmark tests are appended to the test list YAML to cover these models and configurations. Changes
Sequence Diagram(s)sequenceDiagram
participant TestRunner
participant ModelConfig
participant LoRAConfig
TestRunner->>ModelConfig: Request config for model (e.g., phi_4_multimodal_instruct)
ModelConfig->>LoRAConfig: Check if model label matches "phi_4_multimodal_instruct"
alt If match
LoRAConfig->>ModelConfig: Add lora_target_modules and module mappings
end
ModelConfig-->>TestRunner: Return merged model configuration
Suggested reviewers
Poem
""" 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (1)
173-174
: Extreme FP8 workloads – risk of exceeding memory budget
bielik_11b_v2.2_instruct_fp8
is scheduled at up to 2 000 × 2 000 tokens withcon:250
.
Although FP8 halves KV-cache size, 11-B params at that sequence length still exceed 80 GB on H100 in practice. Consider:-...input_output_len:2000,2000-con:250 +...input_output_len:2000,2000-reqs:8-con:1 # safer defaultor gate the test behind
gpu_memory.gt:160000
.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tests/integration/defs/perf/pytorch_model_config.py
(1 hunks)tests/integration/defs/perf/test_perf.py
(2 hunks)tests/integration/test_lists/qa/trt_llm_release_perf_test.yml
(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tests/integration/defs/perf/pytorch_model_config.py (1)
tensorrt_llm/_torch/models/modeling_phi4mm.py (1)
lora_config
(242-262)
🔇 Additional comments (5)
tests/integration/defs/perf/pytorch_model_config.py (1)
189-199
: Validate themax_lora_rank
for phi_4_multimodal_instructThere’s a mismatch in the
max_lora_rank
value across configurations:
- tests/integration/defs/perf/pytorch_model_config.py (L189-199) sets
max_lora_rank = 64
- tensorrt_llm/_torch/models/modeling_phi4mm.py uses
max_lora_rank = 320
# Max rank for Phi4MM.- examples/llm-api/llm_multilora.py also uses
max_lora_rank = 64
in its sample call.Please confirm whether the lower rank (64) is intentional for faster performance testing, or if it should be aligned with the reference implementation’s value (320).
tests/integration/defs/perf/test_perf.py (2)
117-121
: LGTM! Model path additions look correct.The new model entries follow the established naming conventions and directory structure patterns. The multimodal variants appropriately share the same base path, and the FP8 quantized variant is clearly differentiated.
158-161
: LGTM! LoRA path additions are consistent with the multimodal model structure.The LoRA paths correctly point to the vision-lora and speech-lora directories for the respective variants, matching the expected structure for multimodal models as shown in the reference implementation.
tests/integration/test_lists/qa/trt_llm_release_perf_test.yml (2)
102-105
: LoRA support for image/audio variants is present
Entries forphi_4_multimodal_instruct_image
andphi_4_multimodal_instruct_audio
are defined in both dictionaries in tests/integration/defs/perf/test_perf.py:
- MODEL_PATH_DICT: lines 117–118
- LORA_MODEL_PATH: lines 157–159
No further action required.
81-84
: No action needed: dotted model keys are safe for path look-ups
- The
MODEL_PATH_DICT
mapping correctly defines the key"bielik_11b_v2.2_instruct"
and its FP8 sibling.- All filesystem paths are constructed with
os.path.join(llm_models_root(), MODEL_PATH_DICT[...])
, so the dot in the key never becomes a separator.- The mapping’s value (“Bielik-11B-v2.2-Instruct”) is used as the directory name; dots are valid characters in file and directory names on all target platforms.
7ff755a
to
fe13310
Compare
5f72137
to
a16d8c0
Compare
Signed-off-by: ruodil <[email protected]>
Signed-off-by: ruodil <[email protected]>
…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]>
…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]>
…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]> Signed-off-by: Shreyas Misra <[email protected]>
…VIDIA#5826) Signed-off-by: ruodil <[email protected]> Co-authored-by: Larry <[email protected]> Signed-off-by: Ransiki Zhang <[email protected]>
PR title
Please write the PR title by following template:
[JIRA ticket link/nvbug link/github issue link][fix/feat/doc/infra/...] <summary of this PR>
For example, assume I have a PR hope to support a new feature about cache manager of Jira TRTLLM-1000 ticket, it would be like
[TRTLLM-1000][feat] Support a new feature about cache manager
Description
Please explain the issue and the solution in short.
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...
Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]
to print this help message.See details below for each supported subcommand.
run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]
Launch build/test pipelines. All previously running jobs will be killed.
--disable-fail-fast
(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test
(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-1, xxx"
(OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"
(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--only-multi-gpu-test
(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test
(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test
(OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.--post-merge
(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-[Post-Merge]-1, xxx"
(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.md
.kill
kill
Kill all running builds associated with pull request.
skip
skip --comment COMMENT
Skip testing for latest commit on pull request.
--comment "Reason for skipping build/test"
is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
New Features
Tests