-
Notifications
You must be signed in to change notification settings - Fork 292
【Hackathon 9th No.91】FastDeploy中的MoE GroupGEMM支持INT8*INT8实现 #1164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| - 目前业内`MoE GroupGEMM`没有支持`INT8*INT8`的实现 | ||
|
|
||
| # 四、设计思路与实现方案 | ||
| 1. 一些参考的代码路径 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
快速实现可以参考FD已有的wfp8afp8 triton算子,同时可以参考下vllm和TensorRT-LLM的实现方案。不限制CUDA和triton实现方案。如果在完成算子的基础上,可以加入更进一步算子融合(例如GLM4.5-AIR MoE融合共享专家层)。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢感谢
|
@WanRui37 如果没有修改,我就合入了? |
|
@ckl117 不好意思,后续还有修改,我代码还尚未全部完成,可以后续再合入吗? |
FastDeploy中的MoE GroupGEMM支持INT8*INT8实现的RFC