[AclGraph] Adapt aclgraph into new graph dispatcher arch #2427

MengqingCao · 2025-08-18T13:11:14Z

What this PR does / why we need it?

This pr adapt aclgraph into new graph dispatcher arch in vllm
breaks form:

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer vllm#20059
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728) vllm#19728
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT vllm#23041
[V1] Logits processors extensibility vllm#19912

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

CI passed with existing test.

vLLM version: v0.10.0
vLLM main: vllm-project/vllm@4d9c619

gemini-code-assist

Code Review

This pull request adapts aclgraph to the new graph dispatcher architecture in vLLM. It introduces ACLGraphWrapper for Ascend NPUs and integrates it into the model runner. The changes also include refactoring InputBatch to align with the new logits processor framework. My review found a critical logic error in capture_model that prevents graph capture, and an incomplete implementation in _capture_model that misses capturing graphs for decode mode.

vllm_ascend/worker/model_runner_v1.py

gemini-code-assist · 2025-08-18T13:13:38Z

vllm_ascend/worker/model_runner_v1.py

+            aclgraph_mode = self.compilation_config.cudagraph_mode
+            if aclgraph_mode.mixed_mode() != CUDAGraphMode.NONE:
+                aclgraph_runtime_mode = aclgraph_mode.mixed_mode()
+
+                compilation_cases = list(reversed(self.aclgraph_batch_sizes))
+                self._capture_aclgraphs(
+                    compilation_cases,
+                    cudagraph_runtime_mode=aclgraph_runtime_mode,
+                    uniform_decode=False)


The _capture_model function only seems to capture ACL graphs for the mixed_mode. It's missing the logic to capture graphs for decode_mode (i.e., uniform decode batches). If a decode_mode is configured, the corresponding graphs will not be captured, which could lead to performance issues or errors at runtime. The implementation should also handle cudagraph_mode.decode_mode().

aclgraph_mode = self.compilation_config.cudagraph_mode if aclgraph_mode.mixed_mode() != CUDAGraphMode.NONE: aclgraph_runtime_mode = aclgraph_mode.mixed_mode() compilation_cases = list(reversed(self.aclgraph_batch_sizes)) self._capture_aclgraphs( compilation_cases, cudagraph_runtime_mode=aclgraph_runtime_mode, uniform_decode=False) if aclgraph_mode.decode_mode() != CUDAGraphMode.NONE: aclgraph_runtime_mode = aclgraph_mode.decode_mode() compilation_cases = list( reversed(self.aclgraph_dispatcher.get_decode_graph_keys())) self._capture_aclgraphs( compilation_cases, cudagraph_runtime_mode=aclgraph_runtime_mode, uniform_decode=True)

we don't capture decode_mode here because the full graph is not supported currently.

github-actions · 2025-08-18T13:20:47Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-08-19T08:57:48Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: wangli <[email protected]>

Signed-off-by: MengqingCao <[email protected]>

Signed-off-by: wangli <[email protected]>

Signed-off-by: MengqingCao <[email protected]>

Signed-off-by: wangli <[email protected]>

Signed-off-by: MengqingCao <[email protected]>

Signed-off-by: wangli <[email protected]>

Signed-off-by: MengqingCao <[email protected]>

Signed-off-by: weiguihua2 <[email protected]>

Signed-off-by: MengqingCao <[email protected]>

github-actions · 2025-08-20T01:03:15Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

gemini-code-assist bot reviewed Aug 18, 2025

View reviewed changes

github-actions bot added the module:core label Aug 18, 2025

MengqingCao force-pushed the aclgraph branch 3 times, most recently from f38a389 to 3253457 Compare August 19, 2025 04:56

github-actions bot added module:tests merge-conflicts labels Aug 19, 2025

Potabk and others added 8 commits August 19, 2025 17:00

support logitsprocessor

2fcd79a

Signed-off-by: wangli <[email protected]>

[AclGraph] Adapt aclgraph into new graph dispatcher arch

b2da616

Signed-off-by: MengqingCao <[email protected]>

fix input_batch ut

95ed739

Signed-off-by: wangli <[email protected]>

fix some ut

3e6ab9e

Signed-off-by: MengqingCao <[email protected]>

fix lint

20c44ee

Signed-off-by: wangli <[email protected]>

fix lint

647f78b

Signed-off-by: wangli <[email protected]>

sikp lint for now

ed9a04e

Signed-off-by: wangli <[email protected]>

fix workflow

904c46b

Signed-off-by: wangli <[email protected]>

Potabk force-pushed the aclgraph branch from b381ed5 to 904c46b Compare August 19, 2025 09:03

github-actions bot removed the merge-conflicts label Aug 19, 2025

revert git checkout@v5

687a5f9

Signed-off-by: wangli <[email protected]>

Potabk force-pushed the aclgraph branch from 302a1bc to 687a5f9 Compare August 19, 2025 09:11

MengqingCao and others added 9 commits August 19, 2025 09:29

fix some ut

04b5335

Signed-off-by: MengqingCao <[email protected]>

fix lint

1dc5e7d

Signed-off-by: wangli <[email protected]>

fix graph batch size

915613e

Signed-off-by: MengqingCao <[email protected]>

refact attention metadata build

127fd59

Signed-off-by: weiguihua2 <[email protected]>

refact attention metadata build

cdc742a

Signed-off-by: weiguihua2 <[email protected]>

skip lora

b1ec0d2

Signed-off-by: MengqingCao <[email protected]>

skip lora

ff6992c

Signed-off-by: MengqingCao <[email protected]>

run singlecard no rely

f96895e

Signed-off-by: MengqingCao <[email protected]>

check aclgraph in _capture_aclgraph

09c57bb

Signed-off-by: MengqingCao <[email protected]>

MengqingCao added 2 commits August 19, 2025 14:13

fix _pool

7599bf2

Signed-off-by: MengqingCao <[email protected]>

fix ci

9b66e01

Signed-off-by: MengqingCao <[email protected]>

github-actions bot added the merge-conflicts label Aug 20, 2025

wangxiyuan closed this Aug 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AclGraph] Adapt aclgraph into new graph dispatcher arch #2427

[AclGraph] Adapt aclgraph into new graph dispatcher arch #2427

Uh oh!

MengqingCao commented Aug 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot Aug 18, 2025

Uh oh!

MengqingCao Aug 19, 2025

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AclGraph] Adapt aclgraph into new graph dispatcher arch #2427

[AclGraph] Adapt aclgraph into new graph dispatcher arch #2427

Uh oh!

Conversation

MengqingCao commented Aug 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

MengqingCao Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 18, 2025

Uh oh!

github-actions bot commented Aug 19, 2025

Uh oh!

github-actions bot commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MengqingCao commented Aug 18, 2025 •

edited by github-actions bot

Loading