-
Notifications
You must be signed in to change notification settings - Fork 526
[AclGraph] Adapt aclgraph into new graph dispatcher arch #2427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adapts aclgraph to the new graph dispatcher architecture in vLLM. It introduces ACLGraphWrapper for Ascend NPUs and integrates it into the model runner. The changes also include refactoring InputBatch to align with the new logits processor framework. My review found a critical logic error in capture_model that prevents graph capture, and an incomplete implementation in _capture_model that misses capturing graphs for decode mode.
| aclgraph_mode = self.compilation_config.cudagraph_mode | ||
| if aclgraph_mode.mixed_mode() != CUDAGraphMode.NONE: | ||
| aclgraph_runtime_mode = aclgraph_mode.mixed_mode() | ||
|
|
||
| compilation_cases = list(reversed(self.aclgraph_batch_sizes)) | ||
| self._capture_aclgraphs( | ||
| compilation_cases, | ||
| cudagraph_runtime_mode=aclgraph_runtime_mode, | ||
| uniform_decode=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _capture_model function only seems to capture ACL graphs for the mixed_mode. It's missing the logic to capture graphs for decode_mode (i.e., uniform decode batches). If a decode_mode is configured, the corresponding graphs will not be captured, which could lead to performance issues or errors at runtime. The implementation should also handle cudagraph_mode.decode_mode().
aclgraph_mode = self.compilation_config.cudagraph_mode
if aclgraph_mode.mixed_mode() != CUDAGraphMode.NONE:
aclgraph_runtime_mode = aclgraph_mode.mixed_mode()
compilation_cases = list(reversed(self.aclgraph_batch_sizes))
self._capture_aclgraphs(
compilation_cases,
cudagraph_runtime_mode=aclgraph_runtime_mode,
uniform_decode=False)
if aclgraph_mode.decode_mode() != CUDAGraphMode.NONE:
aclgraph_runtime_mode = aclgraph_mode.decode_mode()
compilation_cases = list(
reversed(self.aclgraph_dispatcher.get_decode_graph_keys()))
self._capture_aclgraphs(
compilation_cases,
cudagraph_runtime_mode=aclgraph_runtime_mode,
uniform_decode=True)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't capture decode_mode here because the full graph is not supported currently.
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
f38a389 to
3253457
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: wangli <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: wangli <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: weiguihua2 <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
What this PR does / why we need it?
This pr adapt aclgraph into new graph dispatcher arch in vllm
breaks form:
propose_draft_token_idsnon-blocking for lower TTFT vllm#23041Does this PR introduce any user-facing change?
N/A
How was this patch tested?
CI passed with existing test.