Skip to content

Conversation

@csullivan
Copy link
Contributor

@csullivan csullivan commented Mar 27, 2024

Currently only supports ncclAllreduce equivalent, with op=Sum up to 2**24 bytes as noted below.

  • Adds a fast and simple AllReduce kernel (sum only) using mscclpp smChannel scratch for small reductions up to 2**24 bytes.

Plan to integrate with disco in a follow up PR.

@csullivan csullivan requested a review from tqchen March 27, 2024 18:59
* Add a fast and simple AllReduce kernel (sum only) using
  using mscclpp smChannel scratch for small reductions
  up to 2**24 bytes.
@csullivan csullivan force-pushed the feature/2024-03-20/mscclpp-allreduce branch 4 times, most recently from 471fb9a to b78c1c3 Compare March 28, 2024 18:36
@csullivan csullivan force-pushed the feature/2024-03-20/mscclpp-allreduce branch from b78c1c3 to e2222d6 Compare March 28, 2024 21:05
@tqchen tqchen merged commit 64db9f7 into apache:main Mar 29, 2024
@tqchen
Copy link
Member

tqchen commented Mar 29, 2024

Thanks @csullivan . looking forward to the disco integration

thaisacs pushed a commit to thaisacs/tvm that referenced this pull request Apr 3, 2024
)

* [Runtime] Introduce MSCCLPP with NCCL equivalent interface

* Add a fast and simple AllReduce kernel (sum only) using
  using mscclpp smChannel scratch for small reductions
  up to 2**24 bytes.
Ubospica pushed a commit to Ubospica/tvm-develop that referenced this pull request Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants