-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
disable graph partition in custom op #26952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disable graph partition in custom op #26952
Conversation
Signed-off-by: Boyuan Feng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a disable_graph_partition context manager to resolve a nested CUDAGraph capture issue that occurs when torch.compile is used on both a model and its custom operations. The approach is sound and correctly implemented using a context manager to temporarily modify Inductor's configuration. The fix is applied to the grouped_topk operation, which aligns with the problem description. My primary concern is the reliance on a private PyTorch API (torch._inductor.config), which poses a maintainability risk for future PyTorch upgrades. I've added a comment to highlight this.
vllm/model_executor/utils.py
Outdated
| old_val = torch._inductor.config.graph_partition | ||
| try: | ||
| torch._inductor.config.graph_partition = False | ||
| yield | ||
| finally: | ||
| torch._inductor.config.graph_partition = old_val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation relies on modifying torch._inductor.config.graph_partition, which is an internal, undocumented API of PyTorch's Inductor backend. While this is a clever solution to the nested CUDAGraph problem, it makes the code brittle and susceptible to breaking with future PyTorch updates. It would be beneficial to add a comment here warning about this dependency to aid future maintenance.
| old_val = torch._inductor.config.graph_partition | |
| try: | |
| torch._inductor.config.graph_partition = False | |
| yield | |
| finally: | |
| torch._inductor.config.graph_partition = old_val | |
| # NOTE: This relies on an internal PyTorch Inductor API. | |
| # This may break in future PyTorch versions. | |
| old_val = torch._inductor.config.graph_partition | |
| try: | |
| torch._inductor.config.graph_partition = False | |
| yield | |
| finally: | |
| torch._inductor.config.graph_partition = old_val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config will be BC and tested in pytorch x vllm ci.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we try to make this a decorator so that people can just add it to the same callsite as the torch.compile call?
|
e.g. |
Signed-off-by: Boyuan Feng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silly question: we can't just add options={"graph_partition": False} to the nested torch.compile decorator, can we? Assuming so, can you add a brief comment explaining why not?
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Boyuan Feng <[email protected]>
commit 3f5cc70a38f8b3f67eda8a054efea8247a55cc36 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:53 2025 -0700 Update vllm/model_executor/utils.py Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> commit bbbaed48912bdaebf3f1bc8a07400bffcd01e194 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:05 2025 -0700 nit Signed-off-by: Boyuan Feng <[email protected]> commit de6f2c62b5697e900dda34474e1a9857c7f4bbcf Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:17:45 2025 -0700 rewrite as decorator Signed-off-by: Boyuan Feng <[email protected]> commit cced06b6d2e7fcb5677878e9cc4c4bb766a041bc Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 16:06:12 2025 -0700 disable graph partition in custom op Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
|
@ProExpertProg yeah options should work |
commit 3f5cc70a38f8b3f67eda8a054efea8247a55cc36 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:53 2025 -0700 Update vllm/model_executor/utils.py Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> commit bbbaed48912bdaebf3f1bc8a07400bffcd01e194 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:05 2025 -0700 nit Signed-off-by: Boyuan Feng <[email protected]> commit de6f2c62b5697e900dda34474e1a9857c7f4bbcf Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:17:45 2025 -0700 rewrite as decorator Signed-off-by: Boyuan Feng <[email protected]> commit cced06b6d2e7fcb5677878e9cc4c4bb766a041bc Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 16:06:12 2025 -0700 disable graph partition in custom op Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
commit c1dfad6 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 18:05:06 2025 -0700 nit Signed-off-by: Boyuan Feng <[email protected]> commit 6f9339a Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:58:07 2025 -0700 use torch.compile options Signed-off-by: Boyuan Feng <[email protected]> commit 0ab7175 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:49:17 2025 -0700 lint Signed-off-by: Boyuan Feng <[email protected]> commit d5d36c3 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:53 2025 -0700 Update vllm/model_executor/utils.py Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> commit 04aadb3 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:05 2025 -0700 nit Signed-off-by: Boyuan Feng <[email protected]> commit 8e08521 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:17:45 2025 -0700 rewrite as decorator Signed-off-by: Boyuan Feng <[email protected]> commit 29782df Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 16:06:12 2025 -0700 disable graph partition in custom op Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: ProExpertProg <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
commit 574cddf Merge: c1dfad6 e6ba200 Author: Boyuan Feng <[email protected]> Date: Thu Oct 16 11:53:09 2025 -0700 Merge branch 'main' into bf/disable-partition-in-custom-op commit c1dfad6 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 18:05:06 2025 -0700 nit Signed-off-by: Boyuan Feng <[email protected]> commit 6f9339a Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:58:07 2025 -0700 use torch.compile options Signed-off-by: Boyuan Feng <[email protected]> commit 0ab7175 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:49:17 2025 -0700 lint Signed-off-by: Boyuan Feng <[email protected]> commit d5d36c3 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:53 2025 -0700 Update vllm/model_executor/utils.py Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> commit 04aadb3 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:22:05 2025 -0700 nit Signed-off-by: Boyuan Feng <[email protected]> commit 8e08521 Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 17:17:45 2025 -0700 rewrite as decorator Signed-off-by: Boyuan Feng <[email protected]> commit 29782df Author: Boyuan Feng <[email protected]> Date: Wed Oct 15 16:06:12 2025 -0700 disable graph partition in custom op Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: ProExpertProg <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: 0xrushi <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]> Signed-off-by: Boyuan Feng <[email protected]> Co-authored-by: Luka Govedič <[email protected]> Signed-off-by: 0xrushi <[email protected]>
This PR fixes a nested cudagraph capture issue.
Example:
We apply torch.compile directly on some ops (e.g., grouped_topk) wrapped in custom ops. Inductor graph partition applies cudagraph within the custom op.
At the same time, we compile the model which uses these custom ops. Inductor graph partition also wraps each graph partition with CUDAGraph. Some partitions may include custom ops, which has already been applied cudagraph. This leads to nested cudagraph which is not supported.
This context manager should be wrapped around torch.compile calls within custom ops to avoid the nested cudagraph capture.
Test:
VLLM_USE_STANDALONE_COMPILE=1 python examples/offline_inference/basic/generate.py --model deepseek-ai/DeepSeek-V2-Lite -O.use_inductor_graph_partition=True --max-model-len 1024