[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass #26575

vllmellm · 2025-10-10T08:43:28Z

Purpose

This PR supports fusion pass for ROCM AITER by fusing +rms_norm, aiter rmsnorm ops, and +quant_fp8, vllm quantization custom ops.

Benchmark Result

Metric	Without Fusion Pass	With Fusion Pass
Successful requests	500	500
Benchmark duration (s)	173.76	170.31
Total input tokens	520,558	520,558
Total generated tokens	456,122	456,834
Request throughput (req/s)	2.88	2.94
Output token throughput (tok/s)	2,625.02	2,682.37
Peak output token throughput (tok/s)	7,924.00	7,410.00
Peak concurrent requests	500.00	500.00
Total token throughput (tok/s)	5,620.87	5,738.91
Mean TTFT (ms)	35,048.69	34,637.29
Median TTFT (ms)	28,413.54	28,841.97
P99 TTFT (ms)	91,625.66	90,845.96
Mean TPOT (ms)	167.64	170.28
Median TPOT (ms)	117.48	118.62
P99 TPOT (ms)	881.79	913.72
Mean ITL (ms)	114.17	113.99
Median ITL (ms)	57.83	61.15
P99 ITL (ms)	2,111.96	2,086.31

benchmark setting
vllm bench serve \ --backend vllm \ --model "RedHatAI/Qwen3-14B-FP8-dynamic" \ --dataset-name random \ --num-prompts 500 \ --random-input-len 1000 \ --random-output-len 1000 \ --endpoint /v1/completions \ --random-range-ratio 0.9 \

IMPORTANT NOTE
use --compilation-config '{"pass_config": {"enable_fusion": true, "enable_noop": true, "enable-attn-fusion": false} \, "custom_ops": ["+rms_norm", "+quant_fp8"] to enable fusion pass.

Test Plan

Unit test has been added in vllm/tests/compile/test_rocm_aiter_fusion.py that verifies accuracy, replacement of the ops in cuda graph.
end to end test using RedHatAI/Qwen3-14B-FP8-dynamic model

environment setting
Step 1: run vllm serve
VLLM_ROCM_USE_AITER=1 \ VLLM_USE_V1=1 \ vllm serve RedHatAI/Qwen3-14B-FP8-dynamic \ --compilation-config '{"pass_config": {"enable_fusion": true, "enable_noop": true, "enable-attn-fusion": false} \, "custom_ops": ["+rms_norm", "+quant_fp8"], "cudagraph_capture_sizes": [1,2,4,8,16,24,32,256]}' \ --port 9090 \ --trust-remote-code --swap-space 16 --distributed-executor-backend mp
Step 2: run lm_eval

lm_eval --model local-completions --tasks gsm8k \ --model_args model=RedHatAI/Qwen3-14B-FP8-dynamic,base_url=http://localhost:9090/v1/completions \ --trust_remote_code \ --num_fewshot 5 \ --batch_size 128

Test Results

RedHatAI/Qwen3-14B-FP8-dynamic fusion pass

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.7612	±	0.0117
		strict-match	5	exact_match	↑	0.8741	±	0.0091

RedHatAI/Qwen3-14B-FP8-dynamic without fusion pass

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.7718	±	0.0116
		strict-match	5	exact_match	↑	0.8741	±	0.0091

Unit test result

INFO 10-10 08:39:08 [init.py:224] Automatically detected platform rocm.
============================= test session starts ==============================
platform linux -- Python 3.10.12, pytest-8.4.2, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /app/norm/vllm
configfile: pyproject.toml
plugins: anyio-4.10.0, asyncio-1.2.0
asyncio: mode=strict, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... WARNING 10-10 08:39:11 [interface.py:518] Current platform cuda does not have 'test' attribute.
WARNING 10-10 08:39:11 [interface.py:518] Current platform cuda does not have 'bases' attribute.
WARNING 10-10 08:39:11 [interface.py:518] Current platform cuda does not have 'test' attribute.
collected 2 items

compile/test_rocm_aiter_fusion.py::test_fusion_rmsnorm_quant[1e-05-257-64-dtype0] Matched count: 2
PASSED
compile/test_rocm_aiter_fusion.py::test_fusion_rmsnorm_quant[1e-06-257-64-dtype0] Matched count: 2
PASSED
======================== 2 passed, 2 warnings in 25.65s ========================

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: vllmellm <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

vllm/compilation/pass_manager.py

vllm/compilation/rocm_aiter_rmsnorm_fusion.py

Signed-off-by: vllmellm <[email protected]>

… error for other platforms Signed-off-by: vllmellm <[email protected]>

Signed-off-by: vllmellm <[email protected]>

mergify · 2025-10-23T00:20:21Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @vllmellm.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

vllmellm added 10 commits September 23, 2025 16:56

add support for aiter rms fused dynamic quant

eee85a2

Signed-off-by: vllmellm <[email protected]>

include aiter rms fused quant in FUSED_OP dictionary

f618358

Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin/main' into fusion-rmsnorm-quant

c0f4f96

Signed-off-by: vllmellm <[email protected]>

sync function signature in vllm and aiter kernels

77d8517

Signed-off-by: vllmellm <[email protected]>

bugfix

ecd0169

Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin/main' into fusion-rmsnorm-quant

5c61bbc

refactor code add dynamic fused rmsnorm from aiter

b709e50

Signed-off-by: vllmellm <[email protected]>

clean code and fix when aiter disabled

0cd22d6

Signed-off-by: vllmellm <[email protected]>

add unit test for aiter fusion pass

9b3a386

Signed-off-by: vllmellm <[email protected]>

fix with statement issue

1c0e98f

Signed-off-by: vllmellm <[email protected]>

mergify bot added the rocm Related to AMD ROCm label Oct 10, 2025

vllmellm marked this pull request as ready for review October 10, 2025 18:05

vllmellm requested review from ProExpertProg, youkaichao and zou3519 as code owners October 10, 2025 18:05

chatgpt-codex-connector bot reviewed Oct 10, 2025

View reviewed changes

vllm/compilation/pass_manager.py Show resolved Hide resolved

vllm/compilation/rocm_aiter_rmsnorm_fusion.py Outdated Show resolved Hide resolved

wuhuikx mentioned this pull request Oct 14, 2025

[Performance]: Deepseek-V3 Performance Uplift Plan on ROCm Backend #26768

Open

29 tasks

vllmellm added 7 commits October 14, 2025 10:04

remove print statement

82cf2cc

Signed-off-by: vllmellm <[email protected]>

avoid introducing global variables pointing to aiter ops to not cause…

c325dcb

… error for other platforms Signed-off-by: vllmellm <[email protected]>

Merge remote-tracking branch 'origin/main' into fusion-rmsnorm-quant

0527cbc

clean code

62dd36b

Signed-off-by: vllmellm <[email protected]>

adopt mutable aiter custom ops and auto_functionalize in fusion pass

9136a08

Signed-off-by: vllmellm <[email protected]>

de-functionalize the aiter custom ops that were auto_functionalized

d6268f8

Signed-off-by: vllmellm <[email protected]>

add unit test for fix_functionalization

68fee54

Signed-off-by: vllmellm <[email protected]>

wuhuikx mentioned this pull request Oct 18, 2025

[Feat][aiter][ROCm] Add aiter rmsnorm and quant fusion ROCm/vllm#735

Merged

5 tasks

mergify bot added the needs-rebase label Oct 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass #26575

[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass #26575

vllmellm commented Oct 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass #26575

Are you sure you want to change the base?

[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass #26575

Conversation

vllmellm commented Oct 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Results

RedHatAI/Qwen3-14B-FP8-dynamic fusion pass

RedHatAI/Qwen3-14B-FP8-dynamic without fusion pass

Unit test result

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mergify bot commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vllmellm commented Oct 10, 2025 •

edited by github-actions bot

Loading