Your current environment
The output of python collect_env.py
Your output of `python collect_env.py` here
 
🐛 Describe the bug
We hit an exception on running llama4 models with latest code on ROCm V1:
(VllmWorker rank=2 pid=267) ERROR 06-19 01:00:39 [multiproc_executor.py:488] TypeError: AiterFlashAttentionImpl.__init__() got multiple values for argument 'use_irope'
Current work-around:
To turn off AITER_MHA, with VLLM_ROCM_USE_AITER_MHA=0
Proposal:
The motivation for adding an end to end test for a small version of llama4 models, is that we have seen issues of breaking llama4 models in the past because of lacking such tests.
Before submitting a new issue...