-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Misc] Expand SUPPORTED_HIDDEN_SIZES for DeepEP low-latency kernels #21818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jee Jee Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request expands the supported hidden sizes for DeepEP low-latency kernels by adding 6144
. While the code change is correct, it introduces new functionality without corresponding tests. I've left a high-severity comment recommending the addition of a test case to validate the new hidden size and ensure the stability of the system.
# DeepEP low-latency kernels are compiled only for certain | ||
# specific hidden sizes. | ||
SUPPORTED_HIDDEN_SIZES = [2048, 2560, 4096, 5120, 7168] | ||
SUPPORTED_HIDDEN_SIZES = [2048, 2560, 4096, 5120, 6144, 7168] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding 6144
to SUPPORTED_HIDDEN_SIZES
enables this new configuration, but the pull request lacks corresponding tests to validate its functionality. The existing tests for low-latency DeepEP kernels in tests/kernels/moe/test_deepep_moe.py
do not cover this new hidden size.
To ensure the correctness of the implementation for this new hidden size and to prevent future regressions, please add a test case for k=6144
to test_low_latency_deep_ep_moe
. This will verify that the pre-compiled kernel for this size works as expected.
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: x22x22 <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Noam Gat <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]>
…llm-project#21818) Signed-off-by: Jee Jee Li <[email protected]>
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.Purpose
FIX #21603
Test Plan
Test Result
(Optional) Documentation Update