[Core] Fix torch.dynamo compatibility for Qwen models on vllm-gaudi #25761

pawel-olejniczak · 2025-09-26T11:07:45Z

Purpose

Fix critical torch.dynamo compatibility issues that prevent Qwen2.5 models from running on vllm-gaudi.

This PR addresses three interconnected issues identified during investigation:

Dispatch key set mismatch: make_tensor_with_pad used torch.from_numpy(), creating tensors incompatible with torch.dynamo compilation
Builtin function compilation failure: torch.dynamo couldn't compile max(..., default=0) syntax
AttributeError on None inputs: Penalty application logic wasn't robust to None inputs during warmup scenarios

Error Messages Fixed:

AssertionError: Guard check failed! tensor_from_numpy(stack[0]): dispatch key set mismatch
incorrect arg count <bound method BuiltinVariable._call_min_max of BuiltinVariable(max)> got an unexpected keyword argument 'default'
AttributeError: 'NoneType' object has no attribute 'device'

Test Plan

Local Testing Environment

Hardware: Intel Gaudi3 accelerator via vllm-gaudi
Model: Qwen/Qwen2.5-14B-Instruct
Command:
vllm serve Qwen/Qwen2.5-14B-Instruct --max-model-len 4096 --max-num-seqs 32
Test Request:

curl http://localhost:8000/v1/completions
-H "Content-Type: application/json"
-d '{
"model": "Qwen/Qwen2.5-14B-Instruct",
"prompt": "Give me a short introduction to large language model.",
"max_tokens": 100
}'

Test Result

Before Fix

AssertionError: Guard check failed!
160 tensor_from_numpy(stack): dispatch key set mismatch.
expected=DispatchKeySet(CPU, BackendSelect, ADInplaceOrView),
actual=DispatchKeySet(CPU, BackendSelect)

Server crashed on first inference request.

After Fix

{
"id":"cmpl-421571a6c96049ad91448cfa679c34bc",
"object":"text_completion",
"created":1758727612,
"model":"Qwen/Qwen2.5-14B-Instruct",
"choices":[{
"index":0,
"text":" "A large language model (LLM) is an advanced...",
"logprobs":null,
"finish_reason":"length"
}],
"usage":{"prompt_tokens":16,"total_tokens":116,"completion_tokens":100}
}

Changes Summary

Refactored make_tensor_with_pad (vllm/utils/init.py): Pure PyTorch implementation eliminates NumPy interop issues
Enhanced penalty logic robustness (vllm/model_executor/layers/utils.py): Added None-safety to prevent AttributeErrors
Fixed compiler compatibility: Replaced max(..., default=0) with conditional expression

Essential Elements Checklist:

Purpose: Fix torch.dynamo compatibility for Qwen models on vllm-gaudi
Test plan: Local testing with Qwen/Qwen2.5-14B-Instruct on vllm-gaudi
Test results: Before/after comparison showing successful inference
Documentation: Not required - internal compatibility fix
Release notes: Not required - bug fix improving existing functionality

- Replace NumPy-based make_tensor_with_pad with pure-PyTorch implementation - Add None-safety to penalty application functions - Fix max() builtin compilation issues with torch.dynamo - Enables Qwen2.5-14B-Instruct to run successfully on vllm-gaudi Fixes compatibility issues where torch.dynamo guard failures occurred due to dispatch key set mismatches and AttributeErrors when applying repetition penalties to logits. Signed-off-by: Paweł Olejniczak <[email protected]>

github-actions · 2025-09-26T11:07:54Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify bot added qwen Related to Qwen models v1 labels Sep 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Fix torch.dynamo compatibility for Qwen models on vllm-gaudi #25761

[Core] Fix torch.dynamo compatibility for Qwen models on vllm-gaudi #25761

Uh oh!

pawel-olejniczak commented Sep 26, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

[Core] Fix torch.dynamo compatibility for Qwen models on vllm-gaudi #25761

Are you sure you want to change the base?

[Core] Fix torch.dynamo compatibility for Qwen models on vllm-gaudi #25761

Uh oh!

Conversation

pawel-olejniczak commented Sep 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Local Testing Environment

Test Result

Before Fix

After Fix

Changes Summary

Uh oh!

github-actions bot commented Sep 26, 2025

Uh oh!

Uh oh!

pawel-olejniczak commented Sep 26, 2025 •

edited by github-actions bot

Loading