Skip to content

Commit a5f767c

Browse files
committed
bugfix in triton flash attn func in MLA backend: the function takes no keyword arguments.
Signed-off-by: vllmellm <[email protected]>
1 parent fe742ae commit a5f767c

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

vllm/attention/backends/mla/common.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1076,7 +1076,14 @@ def _flash_attn_varlen_diff_headdims(self, q, k, v, softmax_scale,
10761076
q,
10771077
k,
10781078
maybe_padded_v,
1079-
**kwargs,
1079+
None, # output
1080+
kwargs["cu_seqlens_q"],
1081+
kwargs["cu_seqlens_k"],
1082+
kwargs["max_seqlen_q"],
1083+
kwargs["max_seqlen_k"],
1084+
kwargs["causal"],
1085+
softmax_scale,
1086+
None, # bias
10801087
)
10811088
if is_vllm_fa:
10821089
attn_out = self.flash_attn_varlen_func(

0 commit comments

Comments
 (0)