[AutoDeploy] Merge feat/ad_2025_06_13 feature branch #5454

lucaslie · 2025-06-24T23:01:10Z

mass integration of https://github.com/nv-auto-deploy/TensorRT-LLM/tree/feat/ad_2025_06_13

see individual PRs for details

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

run [--disable-fail-fast --skip-test --stage-list "A10-1, xxx" --gpu-type "A30, H100_PCIe" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-[Post-Merge]-1, xxx"]

Launch build/test pipelines. All previously running jobs will be killed.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests. Will also run L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-[Post-Merge]-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-[Post-Merge]-1, xxx".

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

* Fixed TP strong scaling if TP > num_kv_heads Signed-off-by: greg-kwasniewski1 <[email protected]> * Fixed test_graph_sharding. Attention uses simple_shard Signed-off-by: greg-kwasniewski1 <[email protected]> * head_dim inferred from the graph directly. Fixed test_graph_sharding.py Signed-off-by: greg-kwasniewski1 <[email protected]> * changed GQA block to fit column_row_shard heuristic requirements Signed-off-by: greg-kwasniewski1 <[email protected]> * Fixed parameter count Signed-off-by: greg-kwasniewski1 <[email protected]> --------- Signed-off-by: greg-kwasniewski1 <[email protected]>

* auto_deploy: rename torch and triton custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename flashinfer custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename moe custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename distribution custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename distribution custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename linear custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename rope custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename mla custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: bug fixes Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename mla custom ops (2) Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename tirton rope custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename torch rope custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename triton attention custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: rename torch quantization custom ops Signed-off-by: Neta Zmora <[email protected]> * auto_deploy: fix failing multi-gpu tests Signed-off-by: Neta Zmora <[email protected]> * pre-commit formatting fixes Signed-off-by: Neta Zmora <[email protected]> * pre-commit formatting fixes Signed-off-by: Neta Zmora <[email protected]> * add a README for the custom ops Signed-off-by: Neta Zmora <[email protected]> * rename custom_ops/rope.py to custom_ops/triton_rope.py Signed-off-by: Neta Zmora <[email protected]> * separate moe custom backends to different files Signed-off-by: Neta Zmora <[email protected]> * pre-commit formatting fixes Signed-off-by: Neta Zmora <[email protected]> --------- Signed-off-by: Neta Zmora <[email protected]> Signed-off-by: Neta Zmora <[email protected]>

In #61 I accidentally removed fused_moe.py This PR fixes that and completes my intended change: split fused_moe.py to two files. * add fused_moe.py which I accidentally removed Signed-off-by: Neta Zmora <[email protected]> * separate torch MOE custom ops to a separate file Signed-off-by: Neta Zmora <[email protected]> * separate trtllm MOE custom ops to a separate file Signed-off-by: Neta Zmora <[email protected]> * remove unused imports Signed-off-by: Neta Zmora <[email protected]> --------- Signed-off-by: Neta Zmora <[email protected]>

…64), fixes NVIDIA#4841, fixes NVIDIA#5254 Signed-off-by: Frida Hou <[email protected]>

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie · 2025-06-24T23:03:28Z

/bot run

Copilot

Pull Request Overview

This PR consolidates custom operator usage under the torch.ops.auto_deploy namespace across the codebase and extends multigpu sharding tests with a new GQA‐aware block.

Refactored all Torch operator calls (rope, attention, quant, moe, dist, etc.) to use torch.ops.auto_deploy.*.
Updated unit tests and transformation logic to reference the new auto_deploy namespace.
Added GQA_Block in multigpu graph sharding tests with head‐dimension–aware sharding support.

Reviewed Changes

Copilot reviewed 50 out of 50 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_rope_transformation.py	Replaced `torch.ops.rope.` with `torch.ops.auto_deploy.` for RoPE variants.
tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_quantization.py	Updated quantization operators to `torch.ops.auto_deploy.torch_quant_fp*`.
tests/unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_graph_sharding.py	Introduced `GQA_Block` and adapted sharding checks to use `auto_deploy` ops.
tensorrt_llm/_torch/auto_deploy/transformations/library/sharding.py	Enhanced `_insert_sharded_matmul` with `min_local_shape` and switched to `auto_deploy` distributed ops.
tensorrt_llm/_torch/auto_deploy/custom_ops/*	Registered and renamed all custom ops under the `auto_deploy::` namespace.

Comments suppressed due to low confidence (4)

tests/unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_graph_sharding.py:18

[nitpick] Class names should follow PEP8 CamelCase conventions; consider renaming GQA_Block to GQABlock.

class GQA_Block(nn.Module):

tensorrt_llm/_torch/auto_deploy/shim/demollm.py:322

TorchSampler is not imported in this module; add the appropriate import (e.g., from tensorrt_llm._torch.pyexecutor.sampler import TorchSampler) to avoid undefined symbol errors.

        # set request id if necessary

tensorrt_llm/_torch/auto_deploy/transformations/library/sharding.py:74

The docstring should be updated to document the new min_local_shape parameter, explaining how it constrains tensor splitting based on head dimension.

def _insert_sharded_matmul(

tests/unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_graph_sharding.py:83

Add a dedicated test case for GQA_Block to verify that verify_local_weight_sizes correctly validates head-dimension sharding constraints under GQA scenarios.

    batch_size = 4

tests/unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_graph_sharding.py

tensorrt-cicd · 2025-06-24T23:10:30Z

PR_Github #9764 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-25T01:29:59Z

PR_Github #9764 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #7198 completed with status: 'SUCCESS'

Signed-off-by: Grzegorz Kwasniewski <[email protected]> Signed-off-by: Neta Zmora <[email protected]> Signed-off-by: Frida Hou <[email protected]> Signed-off-by: Lucas Liebenwein <[email protected]> Co-authored-by: Grzegorz Kwasniewski <[email protected]> Co-authored-by: Neta Zmora <[email protected]> Co-authored-by: Frida Hou <[email protected]>

greg-kwasniewski1 and others added 5 commits June 24, 2025 15:27

use mix sampler for trtllm runtime, use trtllm topk impl for demollm (#…

65a6cb4

…64), fixes NVIDIA#4841, fixes NVIDIA#5254 Signed-off-by: Frida Hou <[email protected]>

update usage of trtllm moe op in AutoDeploy

795c8ff

Signed-off-by: Lucas Liebenwein <[email protected]>

lucaslie requested a review from a team as a code owner June 24, 2025 23:01

lucaslie requested a review from suyoggupta June 24, 2025 23:01

github-project-automation bot added this to AutoDeploy Board Jun 24, 2025

github-project-automation bot moved this to Backlog in AutoDeploy Board Jun 24, 2025

lucaslie requested review from Copilot and removed request for suyoggupta June 24, 2025 23:01

lucaslie linked an issue Jun 24, 2025 that may be closed by this pull request

[AutoDeploy] Investigate DemoLLM Token Generation #4841

Closed

lucaslie enabled auto-merge (squash) June 24, 2025 23:05

Copilot AI reviewed Jun 24, 2025

View reviewed changes

tests/unittest/_torch/auto_deploy/unit/multigpu/transformations/library/test_graph_sharding.py Show resolved Hide resolved

suyoggupta approved these changes Jun 25, 2025

View reviewed changes

github-project-automation bot moved this from Backlog to In review in AutoDeploy Board Jun 25, 2025

lucaslie merged commit 5cffb7e into NVIDIA:main Jun 25, 2025
4 checks passed

github-project-automation bot moved this from In review to Done in AutoDeploy Board Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoDeploy] Merge feat/ad_2025_06_13 feature branch #5454

[AutoDeploy] Merge feat/ad_2025_06_13 feature branch #5454

Uh oh!

lucaslie commented Jun 24, 2025

Uh oh!

lucaslie commented Jun 24, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!

[AutoDeploy] Merge feat/ad_2025_06_13 feature branch #5454

[AutoDeploy] Merge feat/ad_2025_06_13 feature branch #5454

Uh oh!

Conversation

lucaslie commented Jun 24, 2025

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

lucaslie commented Jun 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 24, 2025

Uh oh!

tensorrt-cicd commented Jun 25, 2025

Uh oh!

Uh oh!

Uh oh!