fix: fix cuda graph padding for spec decoding #4853

lfr-0531 · 2025-06-03T06:17:23Z

Description

Root cause:

Previously, we used if request.py_batch_idx is None to identify dummy requests when enabling overlap scheduler. However, after the CUDA Graph padding changes in [TRTLLM-5516] perf: replicate dummy request for cuda graph padding #4729, the same dummy request is reused across all model forward passes, causing request.py_batch_idx to be non-None. This leads to an error.

In this PR:

Change to use if request.is_dummy to identify dummy requests.
Move the dummy extend requests to the end of extend_requests to align with the logic for those requests in generation_requests.
Add a test for CUDA Graph padding + spec decoding.

Test Coverage

Add new tests:
- accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding[mtp_nextn=2]
- accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding_4gpus[attention_dp=True-mtp_nextn=0]
- accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding_4gpus[attention_dp=True-mtp_nextn=2]

lfr-0531 · 2025-06-03T06:18:23Z

/bot run

tensorrt-cicd · 2025-06-03T06:23:56Z

PR_Github #7289 [ run ] triggered by Bot

QiJune

LGTM

tensorrt-cicd · 2025-06-03T07:47:08Z

PR_Github #7289 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5281 completed with status: 'FAILURE'

lfr-0531 · 2025-06-03T08:08:03Z

/bot run

tensorrt-cicd · 2025-06-03T08:14:15Z

PR_Github #7307 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T09:57:01Z

PR_Github #7307 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5295 completed with status: 'FAILURE'

lfr-0531 · 2025-06-03T11:34:39Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-03T11:40:39Z

PR_Github #7344 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-03T14:57:30Z

PR_Github #7344 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5323 completed with status: 'FAILURE'

lfr-0531 · 2025-06-04T01:47:25Z

/bot run

tensorrt-cicd · 2025-06-04T01:52:47Z

PR_Github #7413 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T03:19:31Z

PR_Github #7413 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5378 completed with status: 'FAILURE'

lfr-0531 · 2025-06-04T05:01:43Z

/bot run

tensorrt-cicd · 2025-06-04T05:07:09Z

PR_Github #7441 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T07:21:22Z

PR_Github #7441 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5402 completed with status: 'FAILURE'

lfr-0531 · 2025-06-04T08:10:49Z

/bot run

tensorrt-cicd · 2025-06-04T08:16:25Z

PR_Github #7469 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T11:48:16Z

PR_Github #7469 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5422 completed with status: 'FAILURE'

lfr-0531 · 2025-06-04T15:25:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-04T15:31:38Z

PR_Github #7542 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-04T19:43:35Z

PR_Github #7542 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5472 completed with status: 'FAILURE'

lfr-0531 · 2025-06-05T09:51:41Z

/bot kill

tensorrt-cicd · 2025-06-05T09:52:09Z

PR_Github #7691 [ kill ] triggered by Bot

tensorrt-cicd · 2025-06-05T09:52:11Z

PR_Github #7688 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-06-05T09:52:41Z

PR_Github #7691 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 08fa91c

lfr-0531 · 2025-06-05T09:58:24Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-05T10:04:40Z

PR_Github #7699 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-05T15:30:03Z

PR_Github #7699 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5585 completed with status: 'FAILURE'

lfr-0531 · 2025-06-05T16:02:38Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-05T16:08:29Z

PR_Github #7773 [ run ] triggered by Bot

tests/integration/defs/accuracy/test_llm_api_pytorch.py

tensorrt-cicd · 2025-06-06T00:31:14Z

PR_Github #7773 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5629 completed with status: 'FAILURE'

lfr-0531 · 2025-06-06T03:27:54Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-06T03:33:52Z

PR_Github #7831 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T03:35:00Z

PR_Github #7831 [ run ] completed with state FAILURE

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 · 2025-06-06T06:36:56Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-06-06T06:43:37Z

PR_Github #7849 [ run ] triggered by Bot

tensorrt-cicd · 2025-06-06T14:21:39Z

PR_Github #7849 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5668 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 requested a review from a team as a code owner June 3, 2025 06:17

lfr-0531 requested a review from mikeiovine June 3, 2025 06:17

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 05bd143 to 3e2a4ca Compare June 3, 2025 06:18

lfr-0531 requested review from QiJune and nv-yilinf June 3, 2025 06:30

QiJune approved these changes Jun 3, 2025

View reviewed changes

nv-yilinf approved these changes Jun 3, 2025

View reviewed changes

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 3e2a4ca to 34d8474 Compare June 4, 2025 05:00

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 34d8474 to ebf52ff Compare June 4, 2025 15:25

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 08fa91c to 270c7c2 Compare June 5, 2025 09:57

lfr-0531 enabled auto-merge (squash) June 5, 2025 09:58

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 270c7c2 to b7a6c8c Compare June 5, 2025 16:02

Tabrizian reviewed Jun 5, 2025

View reviewed changes

tests/integration/defs/accuracy/test_llm_api_pytorch.py Outdated Show resolved Hide resolved

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from b7a6c8c to 48e1bf2 Compare June 6, 2025 03:27

lfr-0531 added 2 commits June 6, 2025 14:36

fix cuda graph padding for spec decoding.

2345c38

Signed-off-by: Fanrong Li <[email protected]>

add attention dp + cuda graph padding tests.

e48a027

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 48e1bf2 to e48a027 Compare June 6, 2025 06:36

lfr-0531 merged commit 75d020c into NVIDIA:main Jun 6, 2025
3 checks passed

lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jun 8, 2025

fix: fix cuda graph padding for spec decoding (NVIDIA#4853)

ec8dadc

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 deleted the user/fanrongl/fix_spec_cuda_graph_pad branch June 27, 2025 12:43

fix: fix cuda graph padding for spec decoding #4853

fix: fix cuda graph padding for spec decoding #4853

Uh oh!

Conversation

lfr-0531 commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Uh oh!

lfr-0531 commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

lfr-0531 commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

lfr-0531 commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

tensorrt-cicd commented Jun 3, 2025

Uh oh!

lfr-0531 commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

lfr-0531 commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

lfr-0531 commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

lfr-0531 commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

tensorrt-cicd commented Jun 4, 2025

Uh oh!

lfr-0531 commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

lfr-0531 commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

lfr-0531 commented Jun 5, 2025

Uh oh!

tensorrt-cicd commented Jun 5, 2025

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

lfr-0531 commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

Uh oh!

tensorrt-cicd commented Jun 6, 2025

lfr-0531 commented Jun 3, 2025 •

edited

Loading