-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[AMD][ROCm] Improve support of AMD #7448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[AMD][ROCm] Improve support of AMD #7448
Conversation
deepspeed/inference/v2/kernels/cutlass_ops/mixed_gemm/mixed_gemm.cu
Outdated
Show resolved
Hide resolved
5851003
to
1dc6bb7
Compare
@hwchen2017 kindly ask for review after fixed your comments. |
deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/utils_paralleldequant.cuh
Outdated
Show resolved
Hide resolved
09b1953
to
f2dbbb7
Compare
The patch delivers several fixes for building issues for CUDA part of DeepSpeed library. Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before: collected 5298 items / 15 skipped 2773 failed, 862 passed, 1665 skipped, 13 errors After: collected 5851 items / 11 skipped 4187 failed, 1373 passed, 292 skipped, 10 errors Signed-off-by: Artem Kuzmitckii <[email protected]>
Signed-off-by: Artem Kuzmitckii <[email protected]>
part 2 Signed-off-by: Artem Kuzmitckii <[email protected]>
f2dbbb7
to
77a7e06
Compare
Signed-off-by: Artem Kuzmitckii <[email protected]>
45a01df
to
0946828
Compare
Signed-off-by: Artem Kuzmitckii <[email protected]>
@k-artem - is this ready for final review? @hwchen2017 - any remaining review requests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share the error message you get on AMD GPU and explain why these changes can fix issues? It can help us better understand this PR. Thanks!
max(TilingConfig::SMEM_SIZE_B_TILE + SMEM_SIZE_A1_TILE + SMEM_SIZE_A2_TILE, | ||
TilingConfig::SMEM_SIZE_C_TILE); | ||
cudaFuncSetAttribute(QUANT_GEMM_Kernel<TilingConfig, OutputDataType>, | ||
auto kernel = QUANT_GEMM_Kernel<TilingConfig, OutputDataType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change here seems unnecessary because this ops only runs on cuda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, currently no support, can be removed or we can keep for proper cast, because both cuda and hip declares 1st arg as const void *. please share your opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of changing the file extension here? Usually .cpp extension is used for the api definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if bf16 support enabled it should be build using nvcc/hipcc not regular compiler which by default uses for files with cpp extensions in cpp_exetension.py. And actually, as I see it's cuda essentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The binding code may or may not contain cuda code. That's why people use .cpp by default. Let's make it consistent in the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, let me add more details why I did rename, in case of HIP and BF16_AVAILABLE
these files include cuda staff that cannot be compiled by regular compiler which selected in torch/utils/cpp_extension.py (_is_cuda_file
) based on file's extension, example:
In file included from /opt/rocm/include/hip/amd_detail/amd_warp_functions.h:27,
from /opt/rocm/include/hip/amd_detail/amd_hip_bf16.h:113,
from /opt/rocm/include/hip/hip_bf16.h:29,
from /myworkspace/DeepSpeed/csrc/includes/ds_kernel_utils_hip.h:18,
from /myworkspace/DeepSpeed/csrc/includes/quantization_hip.h:10,
from csrc/quantization/pt_binding_hip.cpp:11
and as result __builtin_amdgcn_*
cannot be found.
name = self.NAME if name is None else name | ||
super().__init__(name=name) | ||
if self.is_rocm_pytorch(): | ||
self.enable_bf16 = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the change to https://github.com/deepspeedai/DeepSpeed/blob/master/op_builder/builder.py#L613?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
The patch delivers several fixes for building issues for CUDA part of DeepSpeed library.
Percentage of passed unit tests improved(tested on RDNA hardware, gfx110x and gfx12x) Before:
collected 5298 items / 15 skipped
2773 failed, 862 passed, 1665 skipped, 13 errors
After:
collected 5851 items / 11 skipped
4187 failed, 1373 passed, 292 skipped, 10 errors
Regarding testing of fp_quantizer(DS_BUILD_FP_QUANTIZER) via
tests/unit/ops/fp_quantizer/test_fp_quant.py
, this test depends on QPyTorch which should be patched before run on AMD, please apply Tiiiger/QPyTorch#71