Skip to content

Conversation

lfr-0531
Copy link
Collaborator

@lfr-0531 lfr-0531 commented Jun 3, 2025

Description

Root cause:

  • Previously, we used if request.py_batch_idx is None to identify dummy requests when enabling overlap scheduler. However, after the CUDA Graph padding changes in [TRTLLM-5516] perf: replicate dummy request for cuda graph padding #4729, the same dummy request is reused across all model forward passes, causing request.py_batch_idx to be non-None. This leads to an error.

In this PR:

  • Change to use if request.is_dummy to identify dummy requests.
  • Move the dummy extend requests to the end of extend_requests to align with the logic for those requests in generation_requests.
  • Add a test for CUDA Graph padding + spec decoding.

Test Coverage

  • Add new tests:
    • accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding[mtp_nextn=2]
    • accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding_4gpus[attention_dp=True-mtp_nextn=0]
    • accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_fp8_block_scales_cuda_graph_padding_4gpus[attention_dp=True-mtp_nextn=2]

@lfr-0531 lfr-0531 requested a review from a team as a code owner June 3, 2025 06:17
@lfr-0531 lfr-0531 requested a review from mikeiovine June 3, 2025 06:17
@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 05bd143 to 3e2a4ca Compare June 3, 2025 06:18
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 3, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7289 [ run ] triggered by Bot

@lfr-0531 lfr-0531 requested review from QiJune and nv-yilinf June 3, 2025 06:30
Copy link
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7289 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5281 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 3, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7307 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7307 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5295 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 3, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7344 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7344 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5323 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7413 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7413 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5378 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 3e2a4ca to 34d8474 Compare June 4, 2025 05:00
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7441 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7441 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5402 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 4, 2025

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7469 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7469 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5422 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 34d8474 to ebf52ff Compare June 4, 2025 15:25
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 4, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7542 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7542 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5472 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 5, 2025

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7691 [ kill ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7688 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7691 [ kill ] completed with state SUCCESS
Successfully killed previous jobs for commit 08fa91c

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 08fa91c to 270c7c2 Compare June 5, 2025 09:57
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 5, 2025

/bot run --disable-fail-fast

@lfr-0531 lfr-0531 enabled auto-merge (squash) June 5, 2025 09:58
@tensorrt-cicd
Copy link
Collaborator

PR_Github #7699 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7699 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5585 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 270c7c2 to b7a6c8c Compare June 5, 2025 16:02
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 5, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7773 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7773 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5629 completed with status: 'FAILURE'

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from b7a6c8c to 48e1bf2 Compare June 6, 2025 03:27
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 6, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7831 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7831 [ run ] completed with state FAILURE

@lfr-0531 lfr-0531 force-pushed the user/fanrongl/fix_spec_cuda_graph_pad branch from 48e1bf2 to e48a027 Compare June 6, 2025 06:36
@lfr-0531
Copy link
Collaborator Author

lfr-0531 commented Jun 6, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7849 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #7849 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #5668 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@lfr-0531 lfr-0531 merged commit 75d020c into NVIDIA:main Jun 6, 2025
3 checks passed
lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jun 8, 2025
@lfr-0531 lfr-0531 deleted the user/fanrongl/fix_spec_cuda_graph_pad branch June 27, 2025 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants