[AMD][ROCm] Improve support of AMD #7448

k-artem · 2025-07-24T11:45:28Z

The patch delivers several fixes for building issues for CUDA part of DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors

Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via tests/unit/ops/fp_quantizer/test_fp_quant.py, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71

deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu

csrc/includes/reduction_utils.h

k-artem · 2025-07-31T15:20:38Z

@hwchen2017 kindly ask for review after fixed your comments.

csrc/deepspeed4science/evoformer_attn/gemm_kernel_utils.h

csrc/includes/reduction_utils.h

deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/utils_paralleldequant.cuh

The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors Signed-off-by: Artem Kuzmitckii <[email protected]>

Signed-off-by: Artem Kuzmitckii <[email protected]>

part 2 Signed-off-by: Artem Kuzmitckii <[email protected]>

Signed-off-by: Artem Kuzmitckii <[email protected]>

loadams · 2025-09-02T23:37:23Z

@k-artem - is this ready for final review? @hwchen2017 - any remaining review requests?

hwchen2017

Can you share the error message you get on AMD GPU and explain why these changes can fix issues? It can help us better understand this PR. Thanks!

hwchen2017 · 2025-09-03T05:12:07Z

deepspeed/inference/v2/kernels/core_ops/cuda_linear/linear_kernels_cuda.cu

        max(TilingConfig::SMEM_SIZE_B_TILE + SMEM_SIZE_A1_TILE + SMEM_SIZE_A2_TILE,
            TilingConfig::SMEM_SIZE_C_TILE);
-    cudaFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>,
+    auto kernel = QUANT_GEMM_Kernel<TilingConfig, OutputDataType>;


The change here seems unnecessary because this ops only runs on cuda.

Yep, currently no support, can be removed or we can keep for proper cast, because both cuda and hip declares 1st arg as const void *. please share your opinion.

Let's remove it for now.

hwchen2017 · 2025-09-03T05:14:19Z

csrc/transformer/inference/csrc/pt_binding.cu

what's the purpose of changing the file extension here? Usually .cpp extension is used for the api definition.

if bf16 support enabled it should be build using nvcc/hipcc not regular compiler which by default uses for files with cpp extensions in cpp_exetension.py. And actually, as I see it's cuda essentially.

The binding code may or may not contain cuda code. That's why people use .cpp by default. Let's make it consistent in the repo.

Yep, let me add more details why I did rename, in case of HIP and BF16_AVAILABLE these files include cuda staff that cannot be compiled by regular compiler which selected in torch/utils/cpp_extension.py (_is_cuda_file) based on file's extension, example:

In file included from /opt/rocm/include/hip/amd_detail/amd_warp_functions.h:27, from /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h:113, from /opt/rocm/include/hip/hip_bf16.h:29, from /myworkspace/DeepSpeed/csrc/includes/ds_kernel_utils_hip.h:18, from /myworkspace/DeepSpeed/csrc/includes/quantization_hip.h:10, from csrc/quantization/pt_binding_hip.cpp:11

and as result __builtin_amdgcn_* cannot be found.

hwchen2017 · 2025-09-03T05:21:54Z

op_builder/fp_quantizer.py

        name = self.NAME if name is None else name
        super().__init__(name=name)
+        if self.is_rocm_pytorch():
+            self.enable_bf16 = True


Can you move the change to https://github.com/deepspeedai/DeepSpeed/blob/master/op_builder/builder.py#L613?

csrc/includes/reduction_utils.h

csrc/fp_quantizer/fp_quantize.cu

k-artem requested review from hwchen2017, tohtana and tjruwase as code owners July 24, 2025 11:45

hwchen2017 reviewed Jul 24, 2025

View reviewed changes

deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu Outdated Show resolved Hide resolved

hwchen2017 reviewed Jul 24, 2025

View reviewed changes

csrc/includes/reduction_utils.h Outdated Show resolved Hide resolved

k-artem requested a review from hwchen2017 July 25, 2025 16:01

k-artem force-pushed the improve_support_of_amd_hardware branch from 5851003 to 1dc6bb7 Compare July 31, 2025 10:25

hwchen2017 reviewed Aug 1, 2025

View reviewed changes

hwchen2017 marked this pull request as draft August 1, 2025 23:21

k-artem force-pushed the improve_support_of_amd_hardware branch from 09b1953 to f2dbbb7 Compare August 3, 2025 15:11

k-artem added 3 commits August 3, 2025 15:15

[AMD][ROCm] Fixes review comments

4490ea5

Signed-off-by: Artem Kuzmitckii <[email protected]>

[AMD][ROCm] Fixes review comments

77a7e06

part 2 Signed-off-by: Artem Kuzmitckii <[email protected]>

k-artem force-pushed the improve_support_of_amd_hardware branch from f2dbbb7 to 77a7e06 Compare August 3, 2025 15:18

k-artem marked this pull request as ready for review August 3, 2025 15:19

k-artem requested a review from hwchen2017 August 3, 2025 15:19

Merge branch 'master' into improve_support_of_amd_hardware

110d6dd

k-artem requested review from loadams and jomayeri as code owners August 18, 2025 17:21

[AMD][ROCm] Enable BF16 and fixes review's comment

0946828

Signed-off-by: Artem Kuzmitckii <[email protected]>

k-artem force-pushed the improve_support_of_amd_hardware branch from 45a01df to 0946828 Compare August 18, 2025 17:22

sfc-gh-truwase and others added 7 commits August 18, 2025 20:12

Merge branch 'master' into improve_support_of_amd_hardware

c75a4b4

Merge branch 'master' into improve_support_of_amd_hardware

f9934bb

Merge branch 'master' into improve_support_of_amd_hardware

2d16fb1

Merge branch 'master' into improve_support_of_amd_hardware

47cb5cc

[AMD][ROCm] Fix format

a23815a

Signed-off-by: Artem Kuzmitckii <[email protected]>

Merge branch 'master' into improve_support_of_amd_hardware

234920e

Merge branch 'master' into improve_support_of_amd_hardware

4eade1e

hwchen2017 reviewed Sep 3, 2025

View reviewed changes

[AMD][ROCm] Improve support of AMD #7448

Are you sure you want to change the base?

[AMD][ROCm] Improve support of AMD #7448

Uh oh!

Conversation

k-artem commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k-artem commented Jul 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

loadams commented Sep 2, 2025

Uh oh!

hwchen2017 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-artem Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

k-artem commented Jul 24, 2025 •

edited

Loading

k-artem Sep 4, 2025 •

edited

Loading