Skip to content

Conversation

pawel-olejniczak
Copy link

@pawel-olejniczak pawel-olejniczak commented Sep 26, 2025

Purpose

Fix critical torch.dynamo compatibility issues that prevent Qwen2.5 models from running on vllm-gaudi.

This PR addresses three interconnected issues identified during investigation:

  1. Dispatch key set mismatch: make_tensor_with_pad used torch.from_numpy(), creating tensors incompatible with torch.dynamo compilation
  2. Builtin function compilation failure: torch.dynamo couldn't compile max(..., default=0) syntax
  3. AttributeError on None inputs: Penalty application logic wasn't robust to None inputs during warmup scenarios

Error Messages Fixed:

  • AssertionError: Guard check failed! tensor_from_numpy(stack[0]): dispatch key set mismatch
  • incorrect arg count <bound method BuiltinVariable._call_min_max of BuiltinVariable(max)> got an unexpected keyword argument 'default'
  • AttributeError: 'NoneType' object has no attribute 'device'

Test Plan

Local Testing Environment

  • Hardware: Intel Gaudi3 accelerator via vllm-gaudi
  • Model: Qwen/Qwen2.5-14B-Instruct
  • Command:
    vllm serve Qwen/Qwen2.5-14B-Instruct --max-model-len 4096 --max-num-seqs 32
  • Test Request:
curl http://localhost:8000/v1/completions
-H "Content-Type: application/json"
-d '{
"model": "Qwen/Qwen2.5-14B-Instruct",
"prompt": "Give me a short introduction to large language model.",
"max_tokens": 100
}'

Test Result

Before Fix

AssertionError: Guard check failed!
160 tensor_from_numpy(stack): dispatch key set mismatch.
expected=DispatchKeySet(CPU, BackendSelect, ADInplaceOrView),
actual=DispatchKeySet(CPU, BackendSelect)

Server crashed on first inference request.

After Fix

{
"id":"cmpl-421571a6c96049ad91448cfa679c34bc",
"object":"text_completion",
"created":1758727612,
"model":"Qwen/Qwen2.5-14B-Instruct",
"choices":[{
"index":0,
"text":" "A large language model (LLM) is an advanced...",
"logprobs":null,
"finish_reason":"length"
}],
"usage":{"prompt_tokens":16,"total_tokens":116,"completion_tokens":100}
}

Changes Summary

  1. Refactored make_tensor_with_pad (vllm/utils/init.py): Pure PyTorch implementation eliminates NumPy interop issues
  2. Enhanced penalty logic robustness (vllm/model_executor/layers/utils.py): Added None-safety to prevent AttributeErrors
  3. Fixed compiler compatibility: Replaced max(..., default=0) with conditional expression

Essential Elements Checklist:

  • Purpose: Fix torch.dynamo compatibility for Qwen models on vllm-gaudi
  • Test plan: Local testing with Qwen/Qwen2.5-14B-Instruct on vllm-gaudi
  • Test results: Before/after comparison showing successful inference
  • Documentation: Not required - internal compatibility fix
  • Release notes: Not required - bug fix improving existing functionality

- Replace NumPy-based make_tensor_with_pad with pure-PyTorch implementation
- Add None-safety to penalty application functions
- Fix max() builtin compilation issues with torch.dynamo
- Enables Qwen2.5-14B-Instruct to run successfully on vllm-gaudi

Fixes compatibility issues where torch.dynamo guard failures occurred
due to dispatch key set mismatches and AttributeErrors when applying
repetition penalties to logits.

Signed-off-by: Paweł Olejniczak <[email protected]>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added qwen Related to Qwen models v1 labels Sep 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
qwen Related to Qwen models v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant