Skip to content

Conversation

yechank-nvidia
Copy link
Collaborator

@yechank-nvidia yechank-nvidia commented Jul 29, 2025

Add multimodal models on support-matrix doc.

Summary by CodeRabbit

  • Documentation
    • Updated the support matrix to include new AI models and revised modality support for existing ones.
    • Expanded the modality legend to clearly define language, image, video, and audio capabilities.

@yechank-nvidia yechank-nvidia self-assigned this Jul 29, 2025
Copy link
Contributor

coderabbitai bot commented Jul 29, 2025

📝 Walkthrough

"""

Walkthrough

The support matrix documentation for PyTorch backend models was updated to add several new models, revise the modality support for existing models, and expand the modality legend to explicitly define abbreviations for language, image, video, and audio.

Changes

Cohort / File(s) Change Summary
Support Matrix Documentation
docs/source/reference/support-matrix.md
Added new models, updated modality support for existing models, and expanded the modality legend in the support matrix.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

  • [Doc][Qwen3] update qwen3 into support-matrix #6161: Both PRs update the same documentation file, support-matrix.md, to add new models and revise modality support, focusing on expanding the list of supported models and their modalities without changing any code.

Suggested labels

Documentation

Suggested reviewers

  • litaotju
  • amukkara
    """

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c93b74 and afb7657.

📒 Files selected for processing (1)
  • docs/source/reference/support-matrix.md (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/source/reference/support-matrix.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai bot requested a review from litaotju July 29, 2025 04:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
docs/source/reference/support-matrix.md (3)

14-16: Verify new Gemma 3 / HCXVision rows aren’t duplicating existing entries

We now list three Gemma-related rows (Gemma/Gemma2 in the TRT section, and the two new Gemma3* rows here). Please make sure:

  1. There is no pre-existing Gemma3ForCausalLM/Gemma3ForConditionalGeneration elsewhere in the file.
  2. The ordering of architectures in this table (alphabetical by class name) still holds after insertion.

A quick grep across the repo (or the rendered docs) will catch accidental duplication.


27-32: Consistency pass on new multimodal rows

Phi4MM, Qwen2VL, and Qwen2_5_VL use three-letter modality strings.
Consider normalising the whitespace and ordering so every row is L + I + V (+ A) in the same left-to-right order (L, I, V, A) to avoid scanning errors.


37-40: Render the legend with a Sphinx admonition for uniform styling

The plain “Note:” header followed by a bullet list renders differently from the {note} admonitions used elsewhere (e.g. lines 175-177). Switching keeps the docs consistent and guarantees proper theming.

-Note:
-- L: Language
-- I: Image
-- V: Video
-- A: Audio
+```{note}
+Modality legend  
+* **L** – Language  
+* **I** – Image  
+* **V** – Video  
+* **A** – Audio  
+```
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e58afa5 and 1100595.

📒 Files selected for processing (1)
  • docs/source/reference/support-matrix.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/source/reference/support-matrix.md (2)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (1)
docs/source/reference/support-matrix.md (1)

18-22: Cross-check modality changes for LLaVA-NeXT, Llama 4 & Mistral 3

The modality column was changed to:

  • LLaVA-NeXT → L + I
  • Llama 4 → L + I + V
  • Mistral 3 → L + I

Before publishing, confirm that these labels match the current upstream model cards / configs; the support matrix is frequently consulted and discrepancies cause confusion.

@jaedeok-nvidia jaedeok-nvidia self-requested a review July 29, 2025 04:58
Copy link
Collaborator

@jaedeok-nvidia jaedeok-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR distinguishes between the image and video support of multi-modal models, which had previously been mixed together. However, the multimodal capabilities vary across models; some support only images but not videos. Therefore, clarifying this in the support matrix would help readers. Thank @yechank-nvidia for your continuous support in advancing multimodal support for TensorRT-LLM.

Signed-off-by: yechank <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/source/reference/support-matrix.md (1)

14-17: Keep the “Architecture” column alphabetically sorted for easier scanning
The new Gemma 3 / EXAONE 4 / HCXVision rows look good, but their insertion breaks the previously alphabetical ordering of the table (e.g., DeepseekDeciLMExaoneGemma). Consider re-ordering the rows to preserve lexicographic order so users can locate architectures quickly.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1100595 and 7c93b74.

📒 Files selected for processing (1)
  • docs/source/reference/support-matrix.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
docs/source/reference/support-matrix.md (2)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🔇 Additional comments (4)
docs/source/reference/support-matrix.md (4)

19-23: Double-check the updated modality flags & HF example links
LLaVA-NeXT, Llama 4, and Mistral 3 rows were updated/added with new modality combinations. Please verify:
• That the cited HF checkpoints indeed expose image (and video) inputs as declared.
• That the example repository names are still live and not superseded (e.g., “llava-hf/llava-v1.6-mistral-7b-hf”).
A quick check avoids stale links or incorrect capability claims.


28-28: 👍 New Phi-4-multimodal entry is clear and consistent
The addition correctly reflects language + image + audio support and follows the existing format.


32-33: Confirm Qwen-VL modality expansion to include images
Both Qwen2-VL rows now show “L + I + V”. Ensure that image support is truly available in the current backend implementation; otherwise, mark as “L + V” to avoid misleading users.


38-41: Legend update looks good
Adding I, V, A clarifies the new modality abbreviations and keeps the table self-contained.

Copy link
Collaborator

@brb-nv brb-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Signed-off-by: yechank <[email protected]>
@coderabbitai coderabbitai bot requested review from amukkara and litaotju July 30, 2025 01:52
@yechank-nvidia
Copy link
Collaborator Author

/bot run --stage-list "A10-Build_Docs"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13506 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13506 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10118 (Partly Tested) completed with status: 'SUCCESS'

@litaotju litaotju merged commit 83621e4 into NVIDIA:main Jul 31, 2025
2 of 3 checks passed
lancelly pushed a commit to lancelly/TensorRT-LLM that referenced this pull request Aug 6, 2025
jain-ria pushed a commit to jain-ria/TensorRT-LLM that referenced this pull request Aug 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants