Skip to content

Conversation

@MasterJH5574
Copy link
Contributor

This PR introduces the lowering passes for GPU IPC memory and all-reduce. It contains the following changes:

  1. a pass IPCAllreduceRewrite which rewrites "runtime.disco.allreduce" to "runtime.disco.cuda_ipc.custom_allreduce", and rewrites the storage scopes of the all-reduce inputs's from "global" to "ipc_memory" accordingly.

  2. memory planning enhancement, making the planning be aware of storage scopes. So each storage scope will be planned independently.

  3. a pass LowerGPUIPCAllocStorage that rewrites the storage allocation of IPC memory from builtin ops to calls to function "runtime.disco.cuda_ipc.alloc_storage".

  4. supports the op relax.builtin.alloc_tensor with storage scope. The default storage scope is "global".

We write the new passes in Python for experiment and fast development. These are good demos showing we can have efficient development with the architecture enabled by TVM.

This PR introduces the lowering passes for GPU IPC memory and
all-reduce. It contains the following changes:

1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"`
to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites
the storage scopes of the all-reduce inputs's from "global" to
"ipc_memory" accordingly.

2. memory planning enhancement, making the planning be aware of
storage scopes. So each storage scope will be planned independently.

3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation
of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`.

4. supports the op `relax.builtin.alloc_tensor` with storage scope.
The default storage scope is `"global"`.

We write the new passes in Python for experiment and fast development.
These are good demos showing we can have efficient development
with the architecture enabled by TVM.
@MasterJH5574 MasterJH5574 force-pushed the tvm-dev/2024-03-20-lowering-ipc-mem branch from 0211a3a to 3b8183f Compare March 21, 2024 04:02
@tqchen tqchen merged commit 858486f into apache:main Mar 21, 2024
thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
…e#16759)

This PR introduces the lowering passes for GPU IPC memory and
all-reduce. It contains the following changes:

1. a pass `IPCAllreduceRewrite` which rewrites `"runtime.disco.allreduce"`
to `"runtime.disco.cuda_ipc.custom_allreduce"`, and rewrites
the storage scopes of the all-reduce inputs's from "global" to
"ipc_memory" accordingly.

2. memory planning enhancement, making the planning be aware of
storage scopes. So each storage scope will be planned independently.

3. a pass `LowerGPUIPCAllocStorage` that rewrites the storage allocation
of IPC memory from builtin ops to calls to function `"runtime.disco.cuda_ipc.alloc_storage"`.

4. supports the op `relax.builtin.alloc_tensor` with storage scope.
The default storage scope is `"global"`.

We write the new passes in Python for experiment and fast development.
These are good demos showing we can have efficient development
with the architecture enabled by TVM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants