-
Notifications
You must be signed in to change notification settings - Fork 1.8k
doc: update multimodal models on support-matrix.md #6431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: yechank <[email protected]>
📝 Walkthrough""" WalkthroughThe support matrix documentation for PyTorch backend models was updated to add several new models, revise the modality support for existing models, and expand the modality legend to explicitly define abbreviations for language, image, video, and audio. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/reference/support-matrix.md (3)
14-16
: Verify new Gemma 3 / HCXVision rows aren’t duplicating existing entriesWe now list three Gemma-related rows (
Gemma/Gemma2
in the TRT section, and the two newGemma3*
rows here). Please make sure:
- There is no pre-existing
Gemma3ForCausalLM
/Gemma3ForConditionalGeneration
elsewhere in the file.- The ordering of architectures in this table (alphabetical by class name) still holds after insertion.
A quick grep across the repo (or the rendered docs) will catch accidental duplication.
27-32
: Consistency pass on new multimodal rows
Phi4MM
,Qwen2VL
, andQwen2_5_VL
use three-letter modality strings.
Consider normalising the whitespace and ordering so every row isL + I + V (+ A)
in the same left-to-right order (L
,I
,V
,A
) to avoid scanning errors.
37-40
: Render the legend with a Sphinx admonition for uniform stylingThe plain “Note:” header followed by a bullet list renders differently from the
{note}
admonitions used elsewhere (e.g. lines 175-177). Switching keeps the docs consistent and guarantees proper theming.-Note: -- L: Language -- I: Image -- V: Video -- A: Audio +```{note} +Modality legend +* **L** – Language +* **I** – Image +* **V** – Video +* **A** – Audio +```
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/reference/support-matrix.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/source/reference/support-matrix.md (2)
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/reference/support-matrix.md (1)
18-22
: Cross-check modality changes for LLaVA-NeXT, Llama 4 & Mistral 3The modality column was changed to:
- LLaVA-NeXT →
L + I
- Llama 4 →
L + I + V
- Mistral 3 →
L + I
Before publishing, confirm that these labels match the current upstream model cards / configs; the support matrix is frequently consulted and discrepancies cause confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR distinguishes between the image and video support of multi-modal models, which had previously been mixed together. However, the multimodal capabilities vary across models; some support only images but not videos. Therefore, clarifying this in the support matrix would help readers. Thank @yechank-nvidia for your continuous support in advancing multimodal support for TensorRT-LLM.
Signed-off-by: yechank <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/source/reference/support-matrix.md (1)
14-17
: Keep the “Architecture” column alphabetically sorted for easier scanning
The new Gemma 3 / EXAONE 4 / HCXVision rows look good, but their insertion breaks the previously alphabetical ordering of the table (e.g., Deepseek → DeciLM → Exaone → Gemma). Consider re-ordering the rows to preserve lexicographic order so users can locate architectures quickly.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/source/reference/support-matrix.md
(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/source/reference/support-matrix.md (2)
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor()
is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation()
to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
🔇 Additional comments (4)
docs/source/reference/support-matrix.md (4)
19-23
: Double-check the updated modality flags & HF example links
LLaVA-NeXT, Llama 4, and Mistral 3 rows were updated/added with new modality combinations. Please verify:
• That the cited HF checkpoints indeed expose image (and video) inputs as declared.
• That the example repository names are still live and not superseded (e.g., “llava-hf/llava-v1.6-mistral-7b-hf”).
A quick check avoids stale links or incorrect capability claims.
28-28
: 👍 New Phi-4-multimodal entry is clear and consistent
The addition correctly reflects language + image + audio support and follows the existing format.
32-33
: Confirm Qwen-VL modality expansion to include images
Both Qwen2-VL rows now show “L + I + V”. Ensure that image support is truly available in the current backend implementation; otherwise, mark as “L + V” to avoid misleading users.
38-41
: Legend update looks good
Adding I, V, A clarifies the new modality abbreviations and keeps the table self-contained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Signed-off-by: yechank <[email protected]>
/bot run --stage-list "A10-Build_Docs" |
PR_Github #13506 [ run ] triggered by Bot |
PR_Github #13506 [ run ] completed with state |
Signed-off-by: yechank <[email protected]> Signed-off-by: Lanyu Liao <[email protected]>
Signed-off-by: yechank <[email protected]>
Add multimodal models on support-matrix doc.
Summary by CodeRabbit