Skip to content

Q: Support for session-aware cache eviction? #2265

@creatorrr

Description

@creatorrr

Some inference engines (like lmdeploy) support passing session_ids at inference time to hint the engine to manage cache for that request accordingly for persistent sessions where the previous messages of the context are going to remain fixed within that session.

I think this would be a growing need over time and it'd be very useful to have built-in support for this in vllm. That said, the interface is going to be tricky and needs a lot of thought.

What do you think @WoosukKwon ? If it aligns with the project goals, we could start an RFC for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions