[FFI] AudoDLPack compatible with torch stream context #18217

tqchen · 2025-08-19T23:04:11Z

This PR updates the autodlpack path to automatically update the env stream to be consistent with torch stream context.

The change would help to make FFI functions to be compatible in stream based executions.
Specifically, TVMFFIEnvSetStream will be called to set the stream from the torch cuda context so the callee can query the stream via TVMFFIEnvGetCurrentStream. The stream will be recovered after the function call ends

We leverage torch cpp_extension load_inline to create an efficient stream query function so it won't slow down the call, the first time loading might take more time to build the jit module and things should be fast after the torch jit module is cached.

tqchen · 2025-08-19T23:09:40Z

ffi overhead benchmark AMD Ryzen 9 7950X

-----------------------------
Benchmark f(x, y, z) overhead
-----------------------------
numpy.add                                2.0837783813476562e-07 sec/call
torch.add[cpu]                           5.690574645996094e-07 sec/call
torch.add[cuda]                          2.2510528564453123e-06 sec/call
tvm.ffi.nop                              2.9222965240478516e-07 sec/call
tvm.ffi.nop+from_dlpack(torch)           3.5573482513427735e-06 sec/call
tvm.ffi.nop+from_dlpack(numpy)           1.001763343811035e-06 sec/call
tvm.ffi.nop+from_dlpack(tvm)             1.0982036590576173e-06 sec/call
tvm.ffi.nop+from_dlpack(torch.utils)     2.9434442520141603e-06 sec/call
tvm.ffi.nop.autodlpack(torch[cpu])       3.265666961669922e-06 sec/call
tvm.ffi.nop.autodlpack(torch[cuda])      3.4897327423095704e-06 sec/call
tvm.ffi.nop.autodlpack(torch[cuda][stream]) 3.4964323043823244e-06 sec/call
tvm.ffi.nop.autodlpack(numpy)            1.4113664627075195e-06 sec/call
-------------------------------
Benchmark x.__dlpack__ overhead
-------------------------------
torch.utils.dlpack.to_dlpack             3.6129951477050783e-07 sec/call
torch.__dlpack__                         8.010625839233399e-07 sec/call
numpy.__dlpack__                         6.115436553955078e-08 sec/call
tvm.__dlpack__                           9.13858413696289e-08 sec/call
---------------------------------------------------
Benchmark x.__dlpack__(max_version=(1,1)) overhead
---------------------------------------------------
torch.__dlpack__(max_version=(1,1))      Tensor.__dlpack__() got an unexpected keyword argument 'max_version'
numpy.__dlpack__(max_version=(1,1))      7.741451263427734e-08 sec/call
tvm.__dlpack__(max_version=(1,1))        1.41143798828125e-07 sec/call
---------------------------------------------------
Benchmark torch.get_cuda_stream[default stream]
---------------------------------------------------
torch.cuda.current_stream[cpp-extension] 9.298324584960938e-08 sec/call
torch.cuda.current_stream[python]        8.587837219238281e-07 sec/call
---------------------------------------------------
Benchmark torch.get_cuda_stream[non-default stream]
---------------------------------------------------
torch.cuda.current_stream[cpp-extension] 9.508132934570312e-08 sec/call
torch.cuda.current_stream[python]        8.99958610534668e-07 sec/call

This PR updates the autodlpack path to automatically update the env stream to be consistent with torch stream context. The change would help to make FFI functions to be compatible in stream based executions. We leverage torch cpp_extension load_inline to create an efficient query function, the first time loading might take more time to build the jit module and things should be fast after the torch jit module is cached.

tqchen force-pushed the stream-ffi branch from bdac13f to 94dd5c5 Compare August 19, 2025 23:18

yongwww approved these changes Aug 19, 2025

View reviewed changes

yongwww merged commit 216e9e9 into apache:main Aug 20, 2025
13 checks passed

ysh329 mentioned this pull request Oct 24, 2025

[Release] v0.22.0 Release Candidate Notes #18391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FFI] AudoDLPack compatible with torch stream context #18217

[FFI] AudoDLPack compatible with torch stream context #18217

Uh oh!

tqchen commented Aug 19, 2025 •

edited

Loading

Uh oh!

tqchen commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[FFI] AudoDLPack compatible with torch stream context #18217

[FFI] AudoDLPack compatible with torch stream context #18217

Uh oh!

Conversation

tqchen commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tqchen commented Aug 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tqchen commented Aug 19, 2025 •

edited

Loading