[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm #26104

tjtanaa · 2025-10-02T15:20:08Z

Purpose

The refactoring of code has causes the vit flash attn dispatcher logic to enter the wrong code path to import
from vllm.vllm_flash_attn import flash_attn_varlen_func on ROCm platform.

Fix incorrect usage of aiter.flash_attn_varlen_func in MultiHeadAttention class introduced in #23978

Test Plan

Evaluate accuracy of all of the models that uses this vit flash attn dispatcher logic on chartqa dataset.
NOTE: The accuracy by no means indicates the actual model performance on benchmark and the accuracy is not evaluate through the same procedure used in the official release.

Bugfix of MultiHeadAttention class is validated through OpenGVLab/InternVL3_5-8B.

Test Result

Flash Attention Backend Comparison: AIter vs Non-AIter

Model	Backend	Explicit Prompt Relaxed Correctness	Anywhere in Answer Relaxed Correctness
Qwen/Qwen2.5-VL-72B-Instruct	AIter	0.8672	0.8860
Qwen/Qwen2.5-VL-72B-Instruct	No AIter	0.8624	0.8848
Qwen/Qwen3-VL-235B-A22B-Instruct	No AIter	0.8648	0.8656
Qwen/Qwen3-VL-235B-A22B-Instruct	AIter	0.8656	0.8680
zai-org/GLM-4.5V-FP8	No AIter	0.5088	0.5716
zai-org/GLM-4.5V-FP8	AIter	0.4952	0.5580
baidu/ERNIE-4.5-VL-28B-A3B-PT	No AIter	0.8424	0.8828
baidu/ERNIE-4.5-VL-28B-A3B-PT	AIter	0.8444	0.8768
AIDC-AI/Ovis2.5-9B	No AIter	0.8656	0.8764
AIDC-AI/Ovis2.5-9B	AIter	0.8652	0.8784
OpenGVLab/InternVL3_5-8B	No AIter	0.892	0.892
OpenGVLab/InternVL3_5-8B	AIter	0.8964	0.8964

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: tjtanaa <[email protected]>

gemini-code-assist

Code Review

This pull request refactors the Vision Transformer (ViT) flash attention dispatcher logic to centralize it and fix a bug on the ROCm platform. The changes are consistent across multiple model files, replacing duplicated logic with a call to a new utility function maybe_get_vit_flash_attn_backend. This is a good improvement for maintainability. However, I've found a critical issue in the implementation of check_upstream_fa_availability which could lead to runtime errors.

vllm/attention/layer.py

Signed-off-by: tjtanaa <[email protected]>

vllm/attention/layer.py

Signed-off-by: tjtanaa <[email protected]>

DarkLight1337 · 2025-10-02T16:32:37Z

cc @Isotr0py

Signed-off-by: tjtanaa <[email protected]>

…6104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: yewentao256 <[email protected]>

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Tomer Asida <[email protected]>

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Karan Goel <[email protected]>

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Summary: In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Differential Revision: D84946967

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>

Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>

fix vit flash attn dispatcher logic for ROCm

a311a55

Signed-off-by: tjtanaa <[email protected]>

tjtanaa requested review from LucasWilkinson and sighingnow as code owners October 2, 2025 15:20

mergify bot added qwen Related to Qwen models rocm Related to AMD ROCm labels Oct 2, 2025

gemini-code-assist bot reviewed Oct 2, 2025

View reviewed changes