-
Couldn't load subscription status.
- Fork 521
[Bugfix][Model] Fix fusedmoe and make modelrunner_v1 compatible with latest vllm #867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
| max_model_len=self.max_model_len, | ||
| max_num_batched_tokens=self.max_num_tokens, | ||
| device=self.device, | ||
| pin_memory=self.pin_memory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pin_memory=True
| cache_config.cache_dtype] | ||
|
|
||
| self.attn_metadata_builders: list[AscendAttentionMetadataBuilder] = [] | ||
| self.attn_backends: list[type[AscendAttentionBackend]] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless 2L
| self.scheduler_config = vllm_config.scheduler_config | ||
| self.chunked_prefill_enabled = vllm_config.scheduler_config.chunked_prefill_enabled | ||
| self.device = device | ||
| self.pin_memory = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless
|
|
||
| self.is_multimodal_model = self.model_config.is_multimodal_model | ||
| self.block_size = vllm_config.cache_config.block_size | ||
| self.max_model_len = self.model_config.max_model_len |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useless, use self.model_config.max_model_len for InputBatch
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
|
LGTM. let's merge this to unblock CI once the CI passed. Thanks for the fix. |
Signed-off-by: MengqingCao <[email protected]>
Thanks, I make a small change in the latest commit, plz help to review it. |
| self.local_num_experts = self.global_num_experts | ||
| self.expert_map = None | ||
|
|
||
| if vllm_version_is("0.8.5") or vllm_version_is("0.8.5.post1"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part of the code may not be needed, refer to the modification of this part in PR 863
However, the most urgent thing at present is to fix CI, which can be considered later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let' make ci happy first then solve the bug later
…latest vllm (vllm-project#867) ### What this PR does / why we need it? this PR fix CI failure broken by vllm. 1. add moe_config for fused_moe 2. adjust the change for kv cache group from vllm. currently vllm-ascend doesn't support this feature. this is just a quick fix for backward compatibility fix: vllm-project#872 --------- Signed-off-by: MengqingCao <[email protected]>
…latest vllm (vllm-project#867) ### What this PR does / why we need it? this PR fix CI failure broken by vllm. 1. add moe_config for fused_moe 2. adjust the change for kv cache group from vllm. currently vllm-ascend doesn't support this feature. this is just a quick fix for backward compatibility fix: vllm-project#872 --------- Signed-off-by: MengqingCao <[email protected]>
What this PR does / why we need it?
this PR fix CI failure broken by vllm.
fix: #872