Question: Does paged attention demonstrate prefix sharing? 

Reading https://arxiv.org/abs/2311.04934 and wondering if I would gain anything from prompt cache.

My use case is having prompts with overlaping prefixes (mostly a few big ones). And I already use vllm paged attention. 


Assuming I would only want to cache kv states for prefixes (not positioned anywhere like in the paper).
Would there be any gains in caching attention prefix states, or is paged attention and vllm indeed already doing this?

Paper:


```
Paged attention also demonstrates simple prefix sharing,
where different prompts with an identical prefix share
KV Cache
```


Goal:

```
                                           shared inputs with prompt1
                                               |
                                               |
 +---------------------------------+     +-----+------+--------------------+
 |                                 | ... | ////|///// |                    |
 +---------------------------------+     +------------+--------------------+
  prompt 1                                           prompt 2
  request 1                                          request 2


- store prefix->kvs
- request
  - find shared inputs
  - assert_kv_cache(prefix-kvs)


Any gain from this idea?

```

So do we with paged attention already skip the attention for the shared inputs, or is there anything to be gainend from
additionally caching prefix kvs?

If it already caches across requests, what is the mechanism that keeps kv-cache entries from busting? 
Wondering if there are still potential tweaks to make to make sure certain prefixes stay in `kv-cache`. 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question: Does paged attention demonstrate prefix sharing? #2354

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Question: Does paged attention demonstrate prefix sharing? #2354

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions