-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
Closed
Labels
feature requestNew feature or requestNew feature or request
Description
🚀 The feature, motivation and pitch
#9880 adds sample and prompt logprobs support, however prompt logprobs currently require the server to be instantiated with --no-enable-prefix-caching
; otherwise, a request with prompt_logprobs=true
will cause the request to fail with the message "Prefix caching with prompt logprobs not yet supported on VLLM V1."
The challenge of using prompt logprobs alongside APC is how to recover the topk prompt logprobs from an APC cache hit. The existing APC implementation does not cache prompt logprobs; upon a cache hit, cached blocks are treated as "computed" & no prompt logprobs are available for the computed blocks.
Alternatives
A few possible solutions:
- Use APC cached KVs to recompute prompt logprobs if a request with
prompt_logprobs=true
triggers an APC cache hit. This requires model code andmodel_executor
code to support re-running prefill using cached KVs. - Cache prompt logprobs in the APC. The problem with this solution is that a request which triggers an APC cache hit may require a greater number of topk prompt logprobs than the request which filled the cache, in which case recomputation would be necessary anyway.
- Bypass APC for requests with
prompt_logprobs=true
. Requests withprompt_logprobs=true
cannot exploit APC cache. This is the simplest solution but incurs a performance penalty.
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or request