[Bug]: speculative decoding dies:  IndexError: index 0 is out of bounds for dimension 0 with size 0

### Your current environment

```
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=1"' \
    --shm-size=10.24gb \
    -p 5001:5001 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
        -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name phi3mini \
    vllm/vllm-openai:latest \
        --port=5001 \
        --host=0.0.0.0 \
        --model=microsoft/Phi-3-mini-128k-instruct \
        --seed 1234 \
        --trust-remote-code \
        --tensor-parallel-size=1 \
        --max-num-batched-tokens=131072 --max-log-len=100 \
        --max-model-len=131072 \
        --max-num-seqs=17 \
        --use-v2-block-manager \
        --num-speculative-tokens=5 \
        --ngram-prompt-lookup-max=4 \
        --speculative-model="[ngram]" \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.phi3.txt
```

### 🐛 Describe the bug

```
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 375, in execute_model
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return self._run_speculative_decoding_step(execute_model_req,
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return func(*args, **kwds)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 538, in _run_speculative_decoding_step
ERROR 08-01 21:27:03 async_llm_engine.py:56]     accepted_token_ids, target_logprobs = self._verify_tokens(
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/lib/python3.10/contextlib.py", line 79, in inner
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return func(*args, **kwds)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 609, in _verify_tokens
ERROR 08-01 21:27:03 async_llm_engine.py:56]     accepted_token_ids = self.spec_decode_sampler(
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return self._call_impl(*args, **kwargs)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 08-01 21:27:03 async_llm_engine.py:56]     return forward_call(*args, **kwargs)
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 82, in forward
ERROR 08-01 21:27:03 async_llm_engine.py:56]     self._batch_modified_rejection_sampling(
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 119, in _batch_modified_rejection_sampling
ERROR 08-01 21:27:03 async_llm_engine.py:56]     accepted = self._get_accepted(target_probs, draft_probs,
ERROR 08-01 21:27:03 async_llm_engine.py:56]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 190, in _get_accepted
ERROR 08-01 21:27:03 async_llm_engine.py:56]     uniform_rand[idx, :] = torch.rand(1,
ERROR 08-01 21:27:03 async_llm_engine.py:56] IndexError: index 0 is out of bounds for dimension 0 with size 0
```

What very first message to the model of "Who are you?"  I got "I" and then died.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: speculative decoding dies: IndexError: index 0 is out of bounds for dimension 0 with size 0 #7047

Your current environment

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: speculative decoding dies: IndexError: index 0 is out of bounds for dimension 0 with size 0 #7047

Description

Your current environment

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions