Replies: 1 comment
-
Good idea. Stay tuned! In sglang, we have a plan to support chunked/layers/TP awareness eviction strategies. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
It looks like the current Mooncake store does not have awareness of layer granularity, chunk order (earlier vs. later chunks), or TP ranks. This can cause problems during cache eviction:
I think the Mooncake store should be aware of layers, chunk ordering, and TP. On eviction, it should evict a token’s KV caches for all layers and for all TP ranks together (i.e., keep KV state complete per token).
From the chunk-order perspective, eviction policies should prefer removing KVs for later chunks first rather than evicting early-chunk KVs that make prefill expensive.
Beta Was this translation helpful? Give feedback.
All reactions