Support CP with query length larger than 1 #93

minosfuture · 2025-09-17T01:43:34Z

This PR implements the causal mask for interleave context parallelism to allow query length > 1.

The solution follows the discussion between @LucasWilkinson , @youkaichao , and @youzhedian on slack.

key illustration made by @LucasWilkinson :

Normal:

k_toks >   0 1 2 3 4 5
q_toks v  _____________
       2 | 1 1 1
       3 | 1 1 1 1
       4 | 1 1 1 1 1
       5 | 1 1 1 1 1 1


DCP Rank 0:

k_toks >   0 2 4
q_toks v  _______
       2 | 1 1
       3 | 1 1
       4 | 1 1 1
       5 | 1 1 1 


DCP Rank 1:

k_toks >   1 3 5
q_toks v   ______
       2 | 1
       3 | 1 1
       4 | 1 1
       5 | 1 1 1

In the DCP case, the k/v tokens are distributed in an interleaved fashion, see vllm-project/vllm#23734.
Therefore we have 0,2,4 kv on rank0 and 1,3,5 kv on rank1 in the example above. The mask shape is no longer a bottom right triangle.
This requires FA to be aware of cp world size and cp rank, in order to determine the causal mask.
The block tiling implementation also needs to be updated. As illustrated below, we now needs to process block tile (0,1) in CP case, while it can be skipped previously in normal case.

Tests

Added and passed unit tests for CP.

Signed-off-by: Ming Yang <[email protected]>

LucasWilkinson · 2025-09-30T15:08:26Z

hopper/flash_api.cpp

-    bool const packgqa_override = params.arch >= 90 && (params.h / params.h_k) == 8 && 
-                                  params.is_local && 
+    bool const packgqa_override = params.arch >= 90 && (params.h / params.h_k) == 8 &&
+                                  params.is_local &&


do you mind removing the unrelated formatting changes? trying to stay as close to upstream as possible when possible

LucasWilkinson · 2025-09-30T15:15:43Z

hopper/mainloop_fwd_sm90_tma_gmma_ws.hpp

                : std::max(n_block_min,
-                           cute::ceil_div(m_idx_max + seqlen_k - seqlen_q - params.window_size_left, kBlockN));
+                           cute::ceil_div(m_idx_max +
+                                          params.cp_world_size * seqlen_k -


can we use cp_tot_seqlen_k to skip the mul here? should branch in the non-cp case to save the mul?

we could make cp_tot_seqlen_k == seqlen_k in the params.cp_world_size == 1 case

LucasWilkinson · 2025-09-30T15:16:58Z

hopper/seqlen.h

+        , cp_world_size(cp_world_size)
+        , cp_tot_seqlen_k(cp_tot_seqused_k == nullptr
+                          ? 0
+                          : cp_tot_seqused_k[bidb])


ref: https://github.com/vllm-project/flash-attention/pull/93/files#r2391986149

LucasWilkinson

Awesome work! left a few comments but its looking really good!

Signed-off-by: Ming Yang <[email protected]>

LucasWilkinson

LGTM

minosfuture force-pushed the dcp_up branch 2 times, most recently from b8792e9 to 54be252 Compare September 17, 2025 01:55

minosfuture mentioned this pull request Sep 17, 2025

[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 vllm-project/vllm#25049

Merged

5 tasks

minosfuture force-pushed the dcp_up branch from c41471f to bd4bfcf Compare September 17, 2025 07:34

minosfuture added 4 commits September 28, 2025 19:23

Support CP with query len larger than 1

558c5b2

Signed-off-by: Ming Yang <[email protected]>

update tests

5c9b9ae

Signed-off-by: Ming Yang <[email protected]>

update vllm interface

f264b28

Signed-off-by: Ming Yang <[email protected]>

fix n_block_max calculation

e3f796f

Signed-off-by: Ming Yang <[email protected]>

minosfuture force-pushed the dcp_up branch from aff9e72 to e3f796f Compare September 29, 2025 02:24

add cp_tot_seqused_k to calc mask and block boundary

efc45c0

Signed-off-by: Ming Yang <[email protected]>

LucasWilkinson reviewed Sep 30, 2025

View reviewed changes

minosfuture added 3 commits September 30, 2025 16:24

remove format; simplify tot_seqlen_k handling

ce37406

Signed-off-by: Ming Yang <[email protected]>

update test

1e955be

Signed-off-by: Ming Yang <[email protected]>

cleanup and add comments

2cd9f3f

Signed-off-by: Ming Yang <[email protected]>

minosfuture force-pushed the dcp_up branch from 7430deb to 2cd9f3f Compare October 1, 2025 17:42

Merge branch 'main' into dcp_up

dee8f85

LucasWilkinson approved these changes Oct 5, 2025

View reviewed changes

LucasWilkinson merged commit 8f468e7 into vllm-project:main Oct 5, 2025
1 check passed

LucasWilkinson mentioned this pull request Oct 17, 2025

Fix local attention #102

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Support CP with query length larger than 1 #93

Support CP with query length larger than 1 #93

Uh oh!

minosfuture commented Sep 17, 2025 •

edited

Loading

Uh oh!

LucasWilkinson Sep 30, 2025

Uh oh!

minosfuture Oct 1, 2025

Uh oh!

LucasWilkinson Sep 30, 2025

Uh oh!

LucasWilkinson Sep 30, 2025

Uh oh!

minosfuture Oct 1, 2025

Uh oh!

LucasWilkinson Sep 30, 2025

Uh oh!

LucasWilkinson left a comment

Uh oh!

LucasWilkinson left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Support CP with query length larger than 1 #93

Support CP with query length larger than 1 #93

Uh oh!

Conversation

minosfuture commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tests

Uh oh!

LucasWilkinson Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

minosfuture Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

LucasWilkinson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

minosfuture commented Sep 17, 2025 •

edited

Loading