[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs]

### Your current environment

docker 0.6.3.post1
8*A100

```
docker pull vllm/vllm-openai:latest
docker stop qwen25_72b ; docker remove qwen25_72b
docker run -d --restart=always \
    --runtime=nvidia \
    --gpus '"device=4,5,6,7"' \
    --shm-size=10.24gb \
    -p 5001:5001 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -v "${HOME}"/.cache:$HOME/.cache/ -v "${HOME}"/.config:$HOME/.config/   -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    --name qwen25_72b \
     vllm/vllm-openai:latest \
        --port=5001 \
        --host=0.0.0.0 \
        --model=Qwen/Qwen2.5-72B-Instruct \
        --tensor-parallel-size=4 \
        --seed 1234 \
        --trust-remote-code \
        --max-model-len=32768 \
        --max-num-batched-tokens 131072 \
        --max-log-len=100 \
        --api-key=EMPTY \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.qwen25_72b.txt
```

### Model Input Dumps

_No response_

### 🐛 Describe the bug

No such issues with prior vLLM 0.6.2.

Trivial queries work:

```
from openai import OpenAI

client = OpenAI(base_url='FILL ME', api_key='FILL ME')

messages = [
    {
        "role": "user",
        "content": "Who are you?",
    }
]

response = client.chat.completions.create(
    model="Qwen/Qwen2.5-72B-Instruct",
    messages=messages,
    temperature=0.0,
    max_tokens=4096,
)

print(response.choices[0])

```

But longer inputs lead to nonsense only in new vllm:

[qwentest1.py.zip](https://github.com/user-attachments/files/17547921/qwentest1.py.zip)


Gives:
```
Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='A\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text>\n\n</text>\n\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text\n\n</text\n</text>\n\n</text\n</text\n</text\n</text>\n\n</text\n</text\n</text>\n\n</text>\n\n</text\n\n</text\n</text\n</text\n</text>\n\n</text>\n\n</text\n</text>\n\n</text\n\n\n</text>\n\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text</text</text\n</text</text\n</text</text</text\n</text\n</text\n</text>\n\n</text</text</text</text>\n\n</text>\n\n</text</text>\n\n</text>\n\n</text\n</text</text\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text</text>\n\n</text</text</text</text</text</text>\n\n</text</text</text</text</text</text</text</text</text</text</text>\n\n</text>\n\n</text</text>\n\n</text</text</text</text</text>\n\n</text</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text>\n\n</text\n</text>\n\n</text\n</text>\n\n</text>\n\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text\n</text>\n', refusal=None, role='assistant', function_call=None, tool_calls=[]), stop_reason=None)
```

Full logs from that running state.  It was just running overnight and was running some benchmarks.

[qwen25_72b.bad.log.zip](https://github.com/user-attachments/files/17547941/qwen25_72b.bad.log.zip)

Related or not? https://github.com/vllm-project/vllm/issues/9732


### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs] #9769

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Nonsense output for Qwen2.5 72B after upgrading to latest vllm 0.6.3.post1 [REPROs] #9769

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions