-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
docker 0.6.3.post1
8*A100
docker pull vllm/vllm-openai:latest
docker stop qwen25_72b ; docker remove qwen25_72b
docker run -d --restart=always \
--runtime=nvidia \
--gpus '"device=4,5,6,7"' \
--shm-size=10.24gb \
-p 5001:5001 \
-e NCCL_IGNORE_DISABLED_P2P=1 \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
-v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/ -v "${HOME}"/.triton:$HOME/.triton/ \
--network host \
--name qwen25_72b \
vllm/vllm-openai:latest \
--port=5001 \
--host=0.0.0.0 \
--model=Qwen/Qwen2.5-72B-Instruct \
--tensor-parallel-size=4 \
--seed 1234 \
--trust-remote-code \
--max-model-len=32768 \
--max-num-batched-tokens 131072 \
--max-log-len=100 \
--api-key=EMPTY \
--download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.qwen25_72b.txt
Model Input Dumps
No response
🐛 Describe the bug
No such issues with prior vLLM 0.6.2.
Trivial queries work:
from openai import OpenAI
client = OpenAI(base_url='FILL ME', api_key='FILL ME')
messages = [
{
"role": "user",
"content": "Who are you?",
}
]
response = client.chat.completions.create(
model="Qwen/Qwen2.5-72B-Instruct",
messages=messages,
temperature=0.0,
max_tokens=4096,
)
print(response.choices[0])
But longer inputs lead to nonsense only in new vllm:
Gives:
Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='A\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text>\n\n</text>\n\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text\n\n</text\n</text>\n\n</text\n</text\n</text\n</text>\n\n</text\n</text\n</text>\n\n</text>\n\n</text\n\n</text\n</text\n</text\n</text>\n\n</text>\n\n</text\n</text>\n\n</text\n\n\n</text>\n\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text</text</text\n</text</text\n</text</text</text\n</text\n</text\n</text>\n\n</text</text</text</text>\n\n</text>\n\n</text</text>\n\n</text>\n\n</text\n</text</text\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text</text>\n\n</text</text</text</text</text</text>\n\n</text</text</text</text</text</text</text</text</text</text</text>\n\n</text>\n\n</text</text>\n\n</text</text</text</text</text>\n\n</text</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text>\n\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n', refusal=None, role='assistant', function_call=None, tool_calls=[]), stop_reason=None)
Full logs from that running state. It was just running overnight and was running some benchmarks.
Related or not? #9732
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working