-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
docker pull vllm/vllm-openai:latest
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=1"' \
--shm-size=10.24gb \
-p 5001:5001 \
-e NCCL_IGNORE_DISABLED_P2P=1 \
-e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/ -v "${HOME}"/.triton:$HOME/.triton/ \
--network host \
--name phi3mini \
vllm/vllm-openai:latest \
--port=5001 \
--host=0.0.0.0 \
--model=microsoft/Phi-3-mini-128k-instruct \
--seed 1234 \
--trust-remote-code \
--tensor-parallel-size=1 \
--max-num-batched-tokens=131072 --max-log-len=100 \
--max-model-len=131072 \
--max-num-seqs=17 \
--use-v2-block-manager \
--num-speculative-tokens=5 \
--ngram-prompt-lookup-max=4 \
--speculative-model="[ngram]" \
--download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.phi3.txt
🐛 Describe the bug
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 375, in execute_model
ERROR 08-01 21:27:03 async_llm_engine.py:56] return self._run_speculative_decoding_step(execute_model_req,
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/lib/python3.10/contextlib.py", line 79, in inner
ERROR 08-01 21:27:03 async_llm_engine.py:56] return func(*args, **kwds)
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 538, in _run_speculative_decoding_step
ERROR 08-01 21:27:03 async_llm_engine.py:56] accepted_token_ids, target_logprobs = self._verify_tokens(
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/lib/python3.10/contextlib.py", line 79, in inner
ERROR 08-01 21:27:03 async_llm_engine.py:56] return func(*args, **kwds)
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/vllm/spec_decode/spec_decode_worker.py", line 609, in _verify_tokens
ERROR 08-01 21:27:03 async_llm_engine.py:56] accepted_token_ids = self.spec_decode_sampler(
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
ERROR 08-01 21:27:03 async_llm_engine.py:56] return self._call_impl(*args, **kwargs)
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
ERROR 08-01 21:27:03 async_llm_engine.py:56] return forward_call(*args, **kwargs)
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 82, in forward
ERROR 08-01 21:27:03 async_llm_engine.py:56] self._batch_modified_rejection_sampling(
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 119, in _batch_modified_rejection_sampling
ERROR 08-01 21:27:03 async_llm_engine.py:56] accepted = self._get_accepted(target_probs, draft_probs,
ERROR 08-01 21:27:03 async_llm_engine.py:56] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/rejection_sampler.py", line 190, in _get_accepted
ERROR 08-01 21:27:03 async_llm_engine.py:56] uniform_rand[idx, :] = torch.rand(1,
ERROR 08-01 21:27:03 async_llm_engine.py:56] IndexError: index 0 is out of bounds for dimension 0 with size 0
What very first message to the model of "Who are you?" I got "I" and then died.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working