Skip to content

Commit 64862d1

Browse files
maleksan85Aleksandr Malyshev
andauthored
[ROCM][AMD][TRITON] Halving warps number for fw_prefill to reduce spilling (#12713)
Signed-off-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]>
1 parent b3a0d01 commit 64862d1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

vllm/attention/ops/prefix_prefill.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
# Static kernels parameters
1313
BASE_BLOCK = 128 if current_platform.has_device_capability(80) else 64
14-
NUM_WARPS = 8
14+
NUM_WARPS = 4 if current_platform.is_rocm() else 8
1515

1616
# To check compatibility
1717
IS_TURING = current_platform.get_device_capability() == (7, 5)

0 commit comments

Comments
 (0)