[Tracking Issue]: Prefix Caching for Hybrid Models

### 🚀 The feature, motivation and pitch

This issue is meant to track follow-up work items for the [Mamba2 Automatic Prefix Caching](https://github.com/vllm-project/vllm/pull/25752). Below is a non-exhaustive list of already identified work items:

- [x] Address some comments leftover from review (#26222)
- [ ] Implement policy for freeing mamba blocks to fix performance in throughput benchmarks
- [ ] Relax constraint that mamba block size must be multiple of chunk size
- [x] Give user flexibility to set mamba caching granularity (https://github.com/vllm-project/vllm/pull/27289)
- [ ] Support mamba prefix caching and spec decode
- [ ] Fuse logic for SSM state writing into kernels (https://github.com/vllm-project/vllm/pull/26235)
- [ ] Test TP>1 behaviour
- [ ] Cache meta-data builds across KV cache groups (https://github.com/vllm-project/vllm/pull/22788)
- [ ] Additional cleanup in causal_conv1d kernels (e.g., strip out unused logic)
- [x] Enable prefix caching for Mamba1 (https://github.com/vllm-project/vllm/pull/26377)
- [ ] Enable prefix caching for ShortConv
- [ ] Enable prefix caching for LinearAttention
- [ ] Enable prefix caching for GDN (https://github.com/vllm-project/vllm/pull/26807)

cc @tdoublep @s3woz 

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Tracking Issue]: Prefix Caching for Hybrid Models #26201

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Tracking Issue]: Prefix Caching for Hybrid Models #26201

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions