Skip to content

Conversation

lfr-0531
Copy link
Collaborator

@lfr-0531 lfr-0531 commented Jul 11, 2025

[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup

Description

We found an illegal memory access issue in the CUDA graph warmup. The root cause:

When enabling the overlap scheduler, the generation requests should use device tensors from the last iteration to prepare inputs. But in warmup, those dummy generation requests don't have the previous new_tensors_device. So the tensor previous_pos_id_offsets_cuda and previous_kv_lens_offsets_cuda won't be set to zeros

self.previous_pos_id_offsets_cuda *= 0
Then in _preprocess_inputs the kv_lens_cuda will be set to a wrong value
inputs['attn_metadata'].kv_lens_cuda[
Then when writing key/values to the KV cache in applyMLARopeAndAssignQKVKernelGeneration kernel, there will be illegal memory access.

Test Coverage

A new test to L0:

accuracy/test_llm_api_pytorch.py::TestDeepSeekV3Lite::test_bfloat16_4gpus[tp4-mtp_nextn=2-attention_dp=False-cuda_graph=True-overlap_scheduler=True-torch_compile=False]

@lfr-0531 lfr-0531 requested a review from a team as a code owner July 11, 2025 07:11
@lfr-0531 lfr-0531 requested a review from achartier July 11, 2025 07:11
Signed-off-by: Fanrong Li <[email protected]>
@lfr-0531 lfr-0531 requested a review from a team as a code owner July 11, 2025 08:44
@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11644 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11644 [ run ] completed with state FAILURE

@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11662 [ run ] triggered by Bot

@schetlur-nv schetlur-nv requested a review from mikeiovine July 11, 2025 17:20
@tensorrt-cicd
Copy link
Collaborator

PR_Github #11662 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #231 completed with status: 'FAILURE'

@lfr-0531
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11702 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #11702 [ run ] completed with state SUCCESS
/LLM/release-0.21/L0_MergeRequest_PR pipeline #237 completed with status: 'SUCCESS'

@lfr-0531 lfr-0531 enabled auto-merge (squash) July 12, 2025 13:44
@lfr-0531 lfr-0531 merged commit 4905cac into NVIDIA:release/0.21 Jul 12, 2025
3 checks passed
k-l-lambda pushed a commit to k-l-lambda/TensorRT-LLM that referenced this pull request Jul 14, 2025
lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jul 14, 2025
lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jul 14, 2025
@schetlur-nv
Copy link
Collaborator

@lfr-0531 can you please add an Mr description and Pr title please?

lfr-0531 added a commit to lfr-0531/TensorRT-LLM that referenced this pull request Jul 15, 2025
lfr-0531 added a commit that referenced this pull request Jul 15, 2025
Signed-off-by: Fanrong Li <[email protected]>
@pcastonguay
Copy link
Collaborator

Can you fill the description and add a title? Thanks.

evezhier pushed a commit to evezhier/TensorRT-LLM that referenced this pull request Jul 16, 2025
Wanli-Jiang pushed a commit to Wanli-Jiang/TensorRT-LLM that referenced this pull request Jul 17, 2025
dc3671 pushed a commit to dc3671/TensorRT-LLM that referenced this pull request Jul 21, 2025
@lfr-0531 lfr-0531 deleted the user/fanrongl/fix_mtp_illegal_memory_access branch July 29, 2025 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants