[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947

lfr-0531 · 2025-07-11T07:11:07Z

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup

Description

We found an illegal memory access issue in the CUDA graph warmup. The root cause:

When enabling the overlap scheduler, the generation requests should use device tensors from the last iteration to prepare inputs. But in warmup, those dummy generation requests don't have the previous new_tensors_device. So the tensor previous_pos_id_offsets_cuda and previous_kv_lens_offsets_cuda won't be set to zeros

TensorRT-LLM/tensorrt_llm/_torch/pyexecutor/model_engine.py

Line 1441 in fbb4cc7

self.previous_pos_id_offsets_cuda *= 0

Then in _preprocess_inputs the kv_lens_cuda will be set to a wrong value

TensorRT-LLM/tensorrt_llm/_torch/pyexecutor/model_engine.py

Line 1172 in fbb4cc7

inputs['attn_metadata'].kv_lens_cuda[

Then when writing key/values to the KV cache in applyMLARopeAndAssignQKVKernelGeneration kernel, there will be illegal memory access.

Test Coverage

A new test to L0:

accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp4-mtp_nextn=2-attention_dp=False-cuda_graph=True-overlap_scheduler=True-torch_compile=False]

Signed-off-by: Fanrong Li <[email protected]>

tensorrt_llm/_torch/pyexecutor/model_engine.py

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 · 2025-07-11T08:44:34Z

/bot run

tensorrt-cicd · 2025-07-11T08:50:03Z

PR_Github #11644 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-11T09:22:59Z

PR_Github #11644 [ run ] completed with state FAILURE

lfr-0531 · 2025-07-11T14:31:47Z

/bot run

tensorrt-cicd · 2025-07-11T14:37:26Z

PR_Github #11662 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-11T17:46:50Z

PR_Github #11662 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #231 completed with status: 'FAILURE'

lfr-0531 · 2025-07-12T07:46:38Z

/bot run

tensorrt-cicd · 2025-07-12T07:51:47Z

PR_Github #11702 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-12T11:52:55Z

PR_Github #11702 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #237 completed with status: 'SUCCESS'

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

schetlur-nv · 2025-07-14T18:32:45Z

@lfr-0531 can you please add an Mr description and Pr title please?

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

Signed-off-by: Fanrong Li <[email protected]>

pcastonguay · 2025-07-16T01:21:44Z

Can you fill the description and add a title? Thanks.

Signed-off-by: Fanrong Li <[email protected]>

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]> Signed-off-by: Wanli Jiang <[email protected]>

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

set previous offsets to zero for the special cases in warmup.

0ed52e0

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 requested a review from a team as a code owner July 11, 2025 07:11

lfr-0531 requested a review from achartier July 11, 2025 07:11

litaotju approved these changes Jul 11, 2025

View reviewed changes

juney-nvidia reviewed Jul 11, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_engine.py Show resolved Hide resolved

add test.

a1249ab

Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 requested a review from a team as a code owner July 11, 2025 08:44

schetlur-nv requested a review from mikeiovine July 11, 2025 17:20

achartier approved these changes Jul 11, 2025

View reviewed changes

mikeiovine approved these changes Jul 11, 2025

View reviewed changes

lfr-0531 enabled auto-merge (squash) July 12, 2025 13:44

lfr-0531 merged commit 4905cac into NVIDIA:release/0.21 Jul 12, 2025
3 checks passed

k-l-lambda pushed a commit to k-l-lambda/TensorRT-LLM that referenced this pull request Jul 14, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (N…

7294be9

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jul 14, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (N…

0ffc736

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 mentioned this pull request Jul 14, 2025

Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/5947 #5989

Merged

lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jul 14, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (N…

37114c1

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jul 15, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (N…

1bfeae8

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 added a commit that referenced this pull request Jul 15, 2025

Cherry-pick #5947 (#5989)

7a1af1c

Signed-off-by: Fanrong Li <[email protected]>

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request Jul 16, 2025

Cherry-pick NVIDIA#5947 (NVIDIA#5989)

08f4f0d

Signed-off-by: Fanrong Li <[email protected]>

Wanli-Jiang mentioned this pull request Jul 17, 2025

Rebase 0.21-NIM to use changes from 0.21 branch #6117

Closed

Wanli-Jiang pushed a commit to Wanli-Jiang/TensorRT-LLM that referenced this pull request Jul 17, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (N…

6027168

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]> Signed-off-by: Wanli Jiang <[email protected]>

dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Jul 21, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup (N…

b11f006

…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>

lfr-0531 mentioned this pull request Jul 21, 2025

chore: Mass integration of release/0.21 (part 4) #6211

Merged

yweng0828 mentioned this pull request Jul 23, 2025

Illegal memory access encountered in DeepSeek model serve #5242

Closed

4 tasks

lfr-0531 deleted the user/fanrongl/fix_mtp_illegal_memory_access branch July 29, 2025 01:27

symphonylyh mentioned this pull request Aug 8, 2025

Deepseek R1 and V3, FP4 quant, output quality issues at batch size > 2 #4037

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947

Uh oh!

lfr-0531 commented Jul 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

lfr-0531 commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

lfr-0531 commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

lfr-0531 commented Jul 12, 2025

Uh oh!

tensorrt-cicd commented Jul 12, 2025

Uh oh!

tensorrt-cicd commented Jul 12, 2025

Uh oh!

Uh oh!

schetlur-nv commented Jul 14, 2025

Uh oh!

pcastonguay commented Jul 16, 2025

Uh oh!

Uh oh!

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947

Uh oh!

Conversation

lfr-0531 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup

Description

Test Coverage

Uh oh!

Uh oh!

lfr-0531 commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

lfr-0531 commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

tensorrt-cicd commented Jul 11, 2025

Uh oh!

lfr-0531 commented Jul 12, 2025

Uh oh!

tensorrt-cicd commented Jul 12, 2025

Uh oh!

tensorrt-cicd commented Jul 12, 2025

Uh oh!

Uh oh!

schetlur-nv commented Jul 14, 2025

Uh oh!

pcastonguay commented Jul 16, 2025

Uh oh!

Uh oh!

lfr-0531 commented Jul 11, 2025 •

edited

Loading