Skip to content

Conversation

@BoyuanFeng
Copy link
Contributor

@BoyuanFeng BoyuanFeng commented Oct 15, 2025

This PR fixes a nested cudagraph capture issue.

Example:

  1. We apply torch.compile directly on some ops (e.g., grouped_topk) wrapped in custom ops. Inductor graph partition applies cudagraph within the custom op.

  2. At the same time, we compile the model which uses these custom ops. Inductor graph partition also wraps each graph partition with CUDAGraph. Some partitions may include custom ops, which has already been applied cudagraph. This leads to nested cudagraph which is not supported.

This context manager should be wrapped around torch.compile calls within custom ops to avoid the nested cudagraph capture.

Test:
VLLM_USE_STANDALONE_COMPILE=1 python examples/offline_inference/basic/generate.py --model deepseek-ai/DeepSeek-V2-Lite -O.use_inductor_graph_partition=True --max-model-len 1024

@BoyuanFeng BoyuanFeng requested a review from mgoin as a code owner October 15, 2025 23:14
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a disable_graph_partition context manager to resolve a nested CUDAGraph capture issue that occurs when torch.compile is used on both a model and its custom operations. The approach is sound and correctly implemented using a context manager to temporarily modify Inductor's configuration. The fix is applied to the grouped_topk operation, which aligns with the problem description. My primary concern is the reliance on a private PyTorch API (torch._inductor.config), which poses a maintainability risk for future PyTorch upgrades. I've added a comment to highlight this.

Comment on lines 104 to 109
old_val = torch._inductor.config.graph_partition
try:
torch._inductor.config.graph_partition = False
yield
finally:
torch._inductor.config.graph_partition = old_val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation relies on modifying torch._inductor.config.graph_partition, which is an internal, undocumented API of PyTorch's Inductor backend. While this is a clever solution to the nested CUDAGraph problem, it makes the code brittle and susceptible to breaking with future PyTorch updates. It would be beneficial to add a comment here warning about this dependency to aid future maintenance.

Suggested change
old_val = torch._inductor.config.graph_partition
try:
torch._inductor.config.graph_partition = False
yield
finally:
torch._inductor.config.graph_partition = old_val
# NOTE: This relies on an internal PyTorch Inductor API.
# This may break in future PyTorch versions.
old_val = torch._inductor.config.graph_partition
try:
torch._inductor.config.graph_partition = False
yield
finally:
torch._inductor.config.graph_partition = old_val

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config will be BC and tested in pytorch x vllm ci.

@zou3519 zou3519 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 15, 2025
@zou3519 zou3519 requested a review from ProExpertProg October 15, 2025 23:36
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try to make this a decorator so that people can just add it to the same callsite as the torch.compile call?

@ProExpertProg
Copy link
Collaborator

e.g.

@disable_inductor_partition
@torch.compile(...)
def grouped_topk(...)

Signed-off-by: Boyuan Feng <[email protected]>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly question: we can't just add options={"graph_partition": False} to the nested torch.compile decorator, can we? Assuming so, can you add a brief comment explaining why not?

BoyuanFeng and others added 2 commits October 15, 2025 17:22
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
commit 3f5cc70a38f8b3f67eda8a054efea8247a55cc36
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:22:53 2025 -0700

    Update vllm/model_executor/utils.py

    Co-authored-by: Luka Govedič <[email protected]>
    Signed-off-by: Boyuan Feng <[email protected]>

commit bbbaed48912bdaebf3f1bc8a07400bffcd01e194
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:22:05 2025 -0700

    nit

    Signed-off-by: Boyuan Feng <[email protected]>

commit de6f2c62b5697e900dda34474e1a9857c7f4bbcf
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:17:45 2025 -0700

    rewrite as decorator

    Signed-off-by: Boyuan Feng <[email protected]>

commit cced06b6d2e7fcb5677878e9cc4c4bb766a041bc
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 16:06:12 2025 -0700

    disable graph partition in custom op

    Signed-off-by: Boyuan Feng <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
@BoyuanFeng
Copy link
Contributor Author

@ProExpertProg yeah options should work

ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
commit 3f5cc70a38f8b3f67eda8a054efea8247a55cc36
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:22:53 2025 -0700

    Update vllm/model_executor/utils.py

    Co-authored-by: Luka Govedič <[email protected]>
    Signed-off-by: Boyuan Feng <[email protected]>

commit bbbaed48912bdaebf3f1bc8a07400bffcd01e194
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:22:05 2025 -0700

    nit

    Signed-off-by: Boyuan Feng <[email protected]>

commit de6f2c62b5697e900dda34474e1a9857c7f4bbcf
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:17:45 2025 -0700

    rewrite as decorator

    Signed-off-by: Boyuan Feng <[email protected]>

commit cced06b6d2e7fcb5677878e9cc4c4bb766a041bc
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 16:06:12 2025 -0700

    disable graph partition in custom op

    Signed-off-by: Boyuan Feng <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
           commit c1dfad6
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 18:05:06 2025 -0700

               nit

               Signed-off-by: Boyuan Feng <[email protected]>

           commit 6f9339a
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 17:58:07 2025 -0700

               use torch.compile options

               Signed-off-by: Boyuan Feng <[email protected]>

           commit 0ab7175
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 17:49:17 2025 -0700

               lint

               Signed-off-by: Boyuan Feng <[email protected]>

           commit d5d36c3
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 17:22:53 2025 -0700

               Update vllm/model_executor/utils.py

               Co-authored-by: Luka Govedič <[email protected]>
               Signed-off-by: Boyuan Feng <[email protected]>

           commit 04aadb3
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 17:22:05 2025 -0700

               nit

               Signed-off-by: Boyuan Feng <[email protected]>

           commit 8e08521
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 17:17:45 2025 -0700

               rewrite as decorator

               Signed-off-by: Boyuan Feng <[email protected]>

           commit 29782df
           Author: Boyuan Feng <[email protected]>
           Date:   Wed Oct 15 16:06:12 2025 -0700

               disable graph partition in custom op

               Signed-off-by: Boyuan Feng <[email protected]>

           Signed-off-by: ProExpertProg <[email protected]>
Signed-off-by: ProExpertProg <[email protected]>
ProExpertProg added a commit to neuralmagic/vllm that referenced this pull request Oct 16, 2025
commit 574cddf
Merge: c1dfad6 e6ba200
Author: Boyuan Feng <[email protected]>
Date:   Thu Oct 16 11:53:09 2025 -0700

    Merge branch 'main' into bf/disable-partition-in-custom-op

commit c1dfad6
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 18:05:06 2025 -0700

    nit

    Signed-off-by: Boyuan Feng <[email protected]>

commit 6f9339a
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:58:07 2025 -0700

    use torch.compile options

    Signed-off-by: Boyuan Feng <[email protected]>

commit 0ab7175
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:49:17 2025 -0700

    lint

    Signed-off-by: Boyuan Feng <[email protected]>

commit d5d36c3
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:22:53 2025 -0700

    Update vllm/model_executor/utils.py

    Co-authored-by: Luka Govedič <[email protected]>
    Signed-off-by: Boyuan Feng <[email protected]>

commit 04aadb3
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:22:05 2025 -0700

    nit

    Signed-off-by: Boyuan Feng <[email protected]>

commit 8e08521
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 17:17:45 2025 -0700

    rewrite as decorator

    Signed-off-by: Boyuan Feng <[email protected]>

commit 29782df
Author: Boyuan Feng <[email protected]>
Date:   Wed Oct 15 16:06:12 2025 -0700

    disable graph partition in custom op

    Signed-off-by: Boyuan Feng <[email protected]>

Signed-off-by: ProExpertProg <[email protected]>
@DarkLight1337 DarkLight1337 merged commit 0840560 into vllm-project:main Oct 17, 2025
50 checks passed
Zhuul pushed a commit to Zhuul/vllm that referenced this pull request Oct 17, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
albertoperdomo2 pushed a commit to albertoperdomo2/vllm that referenced this pull request Oct 23, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: Alberto Perdomo <[email protected]>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: xuebwang-amd <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: Boyuan Feng <[email protected]>
Signed-off-by: Boyuan Feng <[email protected]>
Co-authored-by: Luka Govedič <[email protected]>
Signed-off-by: 0xrushi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants