Skip to content

Conversation

ggeorgakoudis
Copy link
Contributor

Use a 2-level approach with atomics for optimizing GPU reductions:

  • parallel regions reduce in shared memory with atomic
  • team regions reduce in global memory with atomic

- Use a 2-level approach with atomics
- Support DSA_REDUCTION_MUL for nested for directices
- Clean up code
@ggeorgakoudis ggeorgakoudis merged commit 7f0e146 into main Sep 30, 2025
100 of 106 checks passed
@ggeorgakoudis ggeorgakoudis deleted the optimize-gpu-reductions branch September 30, 2025 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants