-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[BugFix] Fix shared storage connector load kv only load attention layer #21428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: David Chen <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes a bug in the shared storage connector where it would attempt to load KV cache for non-attention layers, causing an AttributeError
. The fix correctly checks for the presence of the kv_cache
attribute before proceeding. My review includes a suggestion to make this check even more robust by verifying that the kv_cache
attribute is subscriptable, which would prevent potential TypeError
exceptions with different layer implementations in the future.
# Only process layers that have kv_cache | ||
# attribute (attention layers) Skip non-attention | ||
# layers like FusedMoE/MLP etc. | ||
kv_cache = getattr(layer, 'kv_cache', None) | ||
if kv_cache is None: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While checking for None
is a good step, it might not be sufficient. If another layer type in no_compile_layers
has a kv_cache
attribute that is not None
but also not subscriptable (e.g., not a dict
), this code will raise a TypeError
on line 168.
To make this more robust and future-proof against different layer implementations, it would be safer to check if the kv_cache
object supports the subscription operation ([]
). Using hasattr(kv_cache, '__getitem__')
would be a more defensive check. This also makes the is None
check redundant, as hasattr(None, '__getitem__')
is False
.
# Only process layers that have kv_cache | |
# attribute (attention layers) Skip non-attention | |
# layers like FusedMoE/MLP etc. | |
kv_cache = getattr(layer, 'kv_cache', None) | |
if kv_cache is None: | |
continue | |
# Only process layers that have a subscriptable kv_cache | |
# attribute (attention layers). Skip non-attention | |
# layers like FusedMoE/MLP etc. | |
kv_cache = getattr(layer, 'kv_cache', None) | |
if not hasattr(kv_cache, '__getitem__'): | |
continue |
Signed-off-by: David Chen <[email protected]>
@DarkLight1337 @KuntaiDu please review,thanks |
The code itself LGTM but let me try this PR first before approving |
ok,thanks again |
@KuntaiDu ,If you have some free time, please spend some of your precious time to try this PR before approving. If you find a problem, I will fix it immediately without blocking your other tasks. thank you |
I ran the latest release and this PR. This PR does fix the issues on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]> Signed-off-by: x22x22 <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]>
…er (vllm-project#21428) Signed-off-by: David Chen <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
To fix #21359 , The bug was introduced in #19717 (Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs) pr19717 add fusedmoe into vllm_config.compilation_config.static_forward_context in all scenarios.
p2p_nccl_connector fix is in #21378
Test Plan
vllm/examples/offline_inference/disaggregated-prefill-v1/run.sh
Test Result
(Optional) Documentation Update