Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
2c6fdc0
add apply_linear_rocm
charlifu Mar 24, 2025
ae5e386
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Mar 25, 2025
f6784a6
add skinny gemm for fp16
charlifu Mar 26, 2025
0993ea0
use wvSplitK
charlifu Mar 26, 2025
6dfdd5f
add env for skinny gemm
charlifu Mar 26, 2025
9aa2059
add bf16 support for llmm1
charlifu Mar 27, 2025
16fb48c
update skinny gemms
charlifu Mar 28, 2025
f6cfce5
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Mar 28, 2025
e06862e
add bf16 wvsplitK
charlifu Mar 29, 2025
5c60d0b
clean up
charlifu Mar 31, 2025
ff65f9a
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Mar 31, 2025
c017ce1
add n == 3 case
charlifu Mar 31, 2025
534eaeb
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Apr 1, 2025
76f8172
disable fp8 gemm padding for rocm
charlifu Apr 1, 2025
91205a4
add wvsplitK fp8 and unit tests
charlifu Apr 8, 2025
0b6e71b
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Apr 8, 2025
63efd7f
fix fp8 skinny gemm call
charlifu Apr 8, 2025
5a09506
fix engine test
charlifu Apr 8, 2025
660fefb
remove env check out of platform class
charlifu Apr 9, 2025
9674634
add torch version check for row_wise scaled_mm
charlifu Apr 9, 2025
de1520e
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Apr 10, 2025
afc4880
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Apr 14, 2025
c8c248b
remove cache decorator to fix V1 error
charlifu Apr 14, 2025
176b754
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Apr 17, 2025
6535863
Update vllm/model_executor/layers/quantization/utils/w8a8_utils.py
charlifu Apr 21, 2025
26f9233
Merge branch 'main' into charlifu/amd_skinny_gemm
charlifu Apr 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -678,6 +678,7 @@ if(VLLM_GPU_LANG STREQUAL "HIP")
#
set(VLLM_ROCM_EXT_SRC
"csrc/rocm/torch_bindings.cpp"
"csrc/rocm/skinny_gemms.cu"
"csrc/rocm/attention.cu")

define_gpu_extension_target(
Expand Down
9 changes: 9 additions & 0 deletions csrc/rocm/ops.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,15 @@

#include <torch/all.h>

torch::Tensor LLMM1(at::Tensor& in_a, at::Tensor& in_b,
const int64_t rows_per_block);

torch::Tensor wvSplitK(at::Tensor& in_a, at::Tensor& in_b,
const int64_t CuCount);

void wvSplitKQ(at::Tensor& in_a, at::Tensor& in_b, at::Tensor& out_c,
at::Tensor& scale_a, at::Tensor& scale_b, const int64_t CuCount);

void paged_attention(torch::Tensor& out, torch::Tensor& exp_sums,
torch::Tensor& max_logits, torch::Tensor& tmp_out,
torch::Tensor& query, torch::Tensor& key_cache,
Expand Down
Loading