Skip to content

Conversation

ashishtanwer
Copy link

@ashishtanwer ashishtanwer commented Aug 7, 2025

Rocm port of the official vllm commit
de98252 by Woosuk Kwon

gshtras and others added 30 commits February 13, 2025 11:53
* fused_moe config for DSv3 on MI300X updated

* Add tuning script and post processing script

Signed-off-by: Randall Smith <[email protected]>

* Add modification to fp8_utils for tuning

Signed-off-by: Randall Smith <[email protected]>

* update tuning script and add the configs

Signed-off-by: Randall Smith <[email protected]>

* slightly better tunings

Signed-off-by: Randall Smith <[email protected]>

* benchmark_moe.py is updated to generate more accurate MoE configs and a specific MoE config for DSv3 is added

* Bug in sgl_moe_align_block_size() is fixed by Greg

* Generate fp8_w8a8 config for MI300XHF

* tunings that don't give garbage output

Signed-off-by: Randall Smith <[email protected]>

* More accurate tunings

Signed-off-by: Randall Smith <[email protected]>

* More accurate tunings and reject inaccurate configs

Signed-off-by: Randall Smith <[email protected]>

* add new tunings

Signed-off-by: Randall Smith <[email protected]>

* rename tuning script and add benchmark script to use for optimizing blockwise quant

Signed-off-by: Randall Smith <[email protected]>

* remove white space from file names

Signed-off-by: Randall Smith <[email protected]>

* remove white space from file names

Signed-off-by: Randall Smith <[email protected]>

* Remove some unnecessary changes

Signed-off-by: Randall Smith <[email protected]>

* don't use space in file names

Signed-off-by: Randall Smith <[email protected]>

* remove XHF tunings

Signed-off-by: Randall Smith <[email protected]>

* remove OAM from file name

Signed-off-by: Randall Smith <[email protected]>

* rmeove OAM from file names

Signed-off-by: Randall Smith <[email protected]>

* yapf

Signed-off-by: Randall Smith <[email protected]>

* update config name

Signed-off-by: Randall Smith <[email protected]>

* remove benchmark_moe.py changes

Signed-off-by: Randall Smith <[email protected]>

* remove is_contiguous

Signed-off-by: Randall Smith <[email protected]>

* use more recent fp8_utils.py

Signed-off-by: Randall Smith <[email protected]>

* remove is_contiguous

Signed-off-by: Randall Smith <[email protected]>

---------

Signed-off-by: Randall Smith <[email protected]>
Co-authored-by: qli88 <[email protected]>
…ed to each following path for their ownership to apply (ROCm#427)
Signed-off-by: isotr0py <[email protected]>
* Enabling ROCm CI on MI250 machines:
- correct build target
- correct queue

Signed-off-by: Alexei V. Ivanov <[email protected]>

---------

Signed-off-by: Alexei V. Ivanov <[email protected]>
* Optimization for quantized gemm skinny sizes

* lint fix

* Add support for bf16/fp16

* code cleanup

* code cleanup

* lint fix2

* cleanup

* Moved the logic into tuned gemm to preserve API compatibility

---------

Co-authored-by: Gregory Shtrasberg <[email protected]>
Co-authored-by: Gregory Shtrasberg <[email protected]>
* Removing gfx940 and gfx941 targets. These have been deprecated in favor of gfx942 for MI300X

Signed-off-by: Gregory Shtrasberg <[email protected]>

* Remove from custom kernels as well

---------

Signed-off-by: Gregory Shtrasberg <[email protected]>
* Advance torch commit to be past pytorch/pytorch#144942 to fix tunable ops

* Make sure to use the submodule commit compatible with the main aiter commit
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
Signed-off-by: Sage Moore <[email protected]>
* Using aiter branch that can be built into a whl with PREBUILD_KERNELS=1

* Using fail fast on aiter build to see compilation errors in the log since it fails silently

* Check for build success without installing whl
* Using proposed fix from ROCm/aiter#115

* Build fix
gshtras and others added 19 commits June 20, 2025 21:31
* Updated README.md for June 24 Docker release

* Added additional throughput results

* Fixed some throughput results
* Minor changes to command line examples

* README changes and added throughput results

Still waiting on latency

* Added latency results

* Update README.md

* Update README.md
* Update test-pipeline.yaml

Disabling the "Tensorizer Test".

The test is seen to generate exceptions while still reporting as successful. That needs to be verified before re-enabling the test in the production environment.

Signed-off-by: Alexei V. Ivanov <[email protected]>

* Fixing pre-commit complaints.

Signed-off-by: Alexei V. Ivanov <[email protected]>

* .

Signed-off-by: Alexei V. Ivanov <[email protected]>

---------

Signed-off-by: Alexei V. Ivanov <[email protected]>
…symbol exposure (vllm-project#21647)"

This reverts commit 9ba1c88.

Signed-off-by: Gregory Shtrasberg <[email protected]>
Rocm port of the official vllm commit
de98252 by Woosuk Kwon
@ashishtanwer ashishtanwer changed the title Add GPT-OSS model code and config [Model] Add GPT-OSS model code and config Aug 7, 2025
@gshtras gshtras force-pushed the main branch 2 times, most recently from 1d2c43d to eb9d4de Compare September 9, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.