-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm #26104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: tjtanaa <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the Vision Transformer (ViT) flash attention dispatcher logic to centralize it and fix a bug on the ROCm platform. The changes are consistent across multiple model files, replacing duplicated logic with a call to a new utility function maybe_get_vit_flash_attn_backend. This is a good improvement for maintainability. However, I've found a critical issue in the implementation of check_upstream_fa_availability which could lead to runtime errors.
Signed-off-by: tjtanaa <[email protected]>
Signed-off-by: tjtanaa <[email protected]>
|
cc @Isotr0py |
Signed-off-by: tjtanaa <[email protected]>
Head branch was pushed to by a user without write access
Signed-off-by: tjtanaa <[email protected]>
Head branch was pushed to by a user without write access
…6104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: yewentao256 <[email protected]>
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Tomer Asida <[email protected]>
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Karan Goel <[email protected]>
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Summary: In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Differential Revision: D84946967
Summary: In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Differential Revision: D84946967
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
Summary: Pull Request resolved: vllm-project#27124 In vllm-project#26104, some changes were made in layer.py that resulted in always trying to switch to FA backend for ViT, even when `VLLM_ATTENTION_BACKEND` is set. This broke Meta's internal AMD pipelines as it is not desired nor expected behavior. With this change, the models that were changed in the offending PR can explicitly opt-in to this behavior. Reviewed By: Prowindy Differential Revision: D84946967
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
…lm-project#26104) Signed-off-by: tjtanaa <[email protected]>
Purpose
The refactoring of code has causes the vit flash attn dispatcher logic to enter the wrong code path to import
from vllm.vllm_flash_attn import flash_attn_varlen_funcon ROCm platform.Fix incorrect usage of
aiter.flash_attn_varlen_funcinMultiHeadAttentionclass introduced in #23978Test Plan
Evaluate accuracy of all of the models that uses this vit flash attn dispatcher logic on chartqa dataset.
NOTE: The accuracy by no means indicates the actual model performance on benchmark and the accuracy is not evaluate through the same procedure used in the official release.
Bugfix of
MultiHeadAttentionclass is validated throughOpenGVLab/InternVL3_5-8B.Test Result
Flash Attention Backend Comparison: AIter vs Non-AIter
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.