Skip to content

Commit d49c9e5

Browse files
lfr-0531Ransiki
authored andcommitted
fix: fix index out of bounds error in spec decoding (NVIDIA#5954)
Signed-off-by: Ransiki Zhang <[email protected]>
1 parent d5be2c3 commit d49c9e5

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

tensorrt_llm/_torch/pyexecutor/model_engine.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1216,7 +1216,8 @@ def _prepare_tp_inputs(
12161216
if next_draft_tokens_device is None or request.is_dummy or request.py_batch_idx is None:
12171217
# get token ids, including input token ids and draft token ids. For these dummy requests,
12181218
# no need to copy the token ids.
1219-
if not request.is_dummy:
1219+
if not (request.is_attention_dp_dummy
1220+
or request.is_cuda_graph_dummy):
12201221
input_ids.append(request.get_last_tokens(0))
12211222
input_ids.extend(request.py_draft_tokens)
12221223
draft_tokens.extend(request.py_draft_tokens)

0 commit comments

Comments
 (0)