[Misc]: a question about chunked-prefill in flash-attn backends

### Anything you want to discuss about vllm.

https://github.com/vllm-project/vllm/blob/99caa4910651754f3f68de518ca42349c8c424d1/vllm/attention/backends/flash_attn.py#L282

I noticed that in flash-attn backends. `forward_prefix` and `forward_decode` seem to be executed serially. Does `forward_decode` wait for `forward_prefix` to finish before running? Can this take advantage of the performance provided by chunked-prefill? I mean the tokens of prefill are in the same batch as the tokens of decode.

```python
if prefill_meta := attn_metadata.prefill_metadata:
    output[:num_prefill_tokens] = PagedAttention.forward_prefix(...)

if decode_meta := attn_metadata.decode_metadata:
    output[num_prefill_tokens:] = PagedAttention.forward_decode(...)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc]: a question about chunked-prefill in flash-attn backends #4863

Anything you want to discuss about vllm.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Misc]: a question about chunked-prefill in flash-attn backends #4863

Description

Anything you want to discuss about vllm.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions