-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 #18825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
Signed-off-by: vllmellm <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question. Otherwise LGTM
) | ||
|
||
if e_score_correction_bias is not None: | ||
torch.ops.vllm.rocm_aiter_biased_grouped_topk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to assert that scoring_func
is "sigmoid"
here? I'm only asking because the previous assert was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SageMoore the scoring_func
is not an argument used in biased_grouped_topk
for this version. so only used for grouped_topk
which supports both sigmoid
and softmax
.
num_expert_group: int = 0, | ||
topk_group: int = 0, | ||
scoring_func: str = "sigmoid", | ||
scoring_func: str = "softmax", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems a bc breaking change, is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@houseroad This is intentional. It is a modified as a new function rocm_aiter_grouped_topk
that follows the same argument signature as grouped_topk
from
def grouped_topk( |
grouped_topk
and biased_grouped_topk
.
) | ||
else: | ||
assert (scoring_func == "softmax" or scoring_func == "sigmoid") | ||
torch.ops.vllm.rocm_aiter_grouped_topk( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where these ops are registered? in AITER repo or vLLM repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@houseroad All of the ops. rocm_aiter_grouped_topk
and rocm_aiter_biased_grouped_topk
are registered in this file: rocm_aiter_fused_moe.py
direct_register_custom_op( |
direct_register_custom_op( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failures are not related to this PR
Signed-off-by: vllmellm <[email protected]> Signed-off-by: amit <[email protected]>
Signed-off-by: vllmellm <[email protected]> Signed-off-by: amit <[email protected]>
This PR introduces a new aiter kernel,
grouped_topk
, from the aiter package. This kernel is used for model architectures and configurations like Deepseek-V2. The previously integrated kernel,biased_grouped_topk
, from aiter package in #17955 is used for model architectures and configurations like Deepseek-V3.lm_eval results on Deepseek-V2-Lite-Chat
command:
VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_LINEAR=0 VLLM_ROCM_USE_AITER_RMSNORM=0 SAFETENSORS_FAST_GPU=1 \ lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V2-Lite-Chat,tensor_parallel_size=1,max_model_len=32768,block_size=1 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
lm_eval results on Deepseek-V3
command:
VLLM_USE_V1=1 VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_LINEAR=0 VLLM_ROCM_USE_AITER_RMSNORM=0 SAFETENSORS_FAST_GPU=1 lm_eval --model vllm --model_args pretrained=deepseek-ai/DeepSeek-V3,tensor_parallel_size=8,max_model_len=32768,block_size=1 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
Performance Result on DeepSeek-V2-Lite-Chat
This only benchmarks the newly added
grouped_topk
kernel.commands:
The observed increase in TTFT could hinder the overall user experience.
Would it be better to enable/disable this kernel with a switch ?
Would appreciate any suggestions about this issue.