-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup #5947
Conversation
Signed-off-by: Fanrong Li <[email protected]>
Signed-off-by: Fanrong Li <[email protected]>
/bot run |
PR_Github #11644 [ run ] triggered by Bot |
PR_Github #11644 [ run ] completed with state |
/bot run |
PR_Github #11662 [ run ] triggered by Bot |
PR_Github #11662 [ run ] completed with state |
/bot run |
PR_Github #11702 [ run ] triggered by Bot |
PR_Github #11702 [ run ] completed with state |
…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>
…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>
…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>
@lfr-0531 can you please add an Mr description and Pr title please? |
…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>
Signed-off-by: Fanrong Li <[email protected]>
Can you fill the description and add a title? Thanks. |
Signed-off-by: Fanrong Li <[email protected]>
…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]> Signed-off-by: Wanli Jiang <[email protected]>
…VIDIA#5947) Signed-off-by: Fanrong Li <[email protected]>
[nvbugs/5333742] fix MTP illegal memory access in cuda graph warmup
Description
We found an illegal memory access issue in the CUDA graph warmup. The root cause:
When enabling the overlap scheduler, the generation requests should use device tensors from the last iteration to prepare inputs. But in warmup, those dummy generation requests don't have the previous
new_tensors_device
. So the tensorprevious_pos_id_offsets_cuda
andprevious_kv_lens_offsets_cuda
won't be set to zerosTensorRT-LLM/tensorrt_llm/_torch/pyexecutor/model_engine.py
Line 1441 in fbb4cc7
_preprocess_inputs
thekv_lens_cuda
will be set to a wrong valueTensorRT-LLM/tensorrt_llm/_torch/pyexecutor/model_engine.py
Line 1172 in fbb4cc7
applyMLARopeAndAssignQKVKernelGeneration
kernel, there will be illegal memory access.Test Coverage
A new test to L0: