[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959

tjtanaa · 2025-03-17T13:54:54Z

This PR integrates the RMS Norm layer functionality from AITER (AI Tensor Engine for ROCm) into vLLM.
This PR is to introduce AITER RMS Norm layer kernel so that any up-coming optimization in AITER kernel could be directly use and evaluated within vLLM framework.

RMS Norm Layer Implementation

The rmsnorm2d_fwd_with_add kernel from AITER has been integrated for the ROCm RMS norm forward pass in /vllm/model_executor/layers/layernorm.py. This feature:

Is enabled by default when the environment variable VLLM_ROCM_USE_AITER=1 is set
Can be specifically enabled or disabled using the dedicated environment variable VLLM_ROCM_USE_AITER_RMSNORM

Performance Improvements over Not using AITER kernel

Llama-3.1-8B-Instruct (with FP8 per-tensor dynamic quantization)

RMS norm only: -1.1~0.8% performance change

Llama-3.1-8B-Instruct-BF16

RMS norm only: 0.5~3.9% performance improvement

Llama-3.1-70B-Instruct (with FP8 per-tensor dynamic quantization)

RMS norm only: -0.02~2% performance change

Llama-3.1-70B-Instruct-BF16

RMS norm only: -0.12~1.2% performance change

Testing

The integration has been verified through:

High-level integration tests with various models
Kernel function dispatch testing to ensure correct operation selection
Quantization compatibility testing

This PR is part of a larger effort to integrate AITER kernels into vLLM for improved performance on ROCm platforms.

Unit tests Status

tests/model_executor/test_enabled_custom_ops.py [Passed]
tests/models/decoder_only/language/test_models.py [Passed*]
- - All passed except bigscience/bloom-560m has been failing in the main branch. (This branch unit tests matches vllm-project/vllm main test statuses)

Signed-off-by: tjtanaa <[email protected]>

github-actions · 2025-03-17T13:55:03Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-03-17T13:56:35Z

tests/models/decoder_only/language/test_models.py

+        if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true":
+            pytest.skip("Skipping test suite for ROCM AITER")


Where is this environment variable being used?

It is used in the .buildkite/run-amd-test.sh to skip the unit test in the CI environment. I add the changes to the .buildkite/run-amd-test.sh .

Instead of using environment variables, can you use pytest custom markers to select/exclude tests?

Why do we want to skip the test in CI? I generally agree, though. A pytest maker would be nicer than an environment variable.

@SageMoore Previously we would not want to break the AMD CI, we have it temporarily disabled. Now since we have pinned down the AITER to a commit in Dockerfile.rocm_base and ensure all the unit tests in them are passing, we will enable the test in CI as well.

@SageMoore @DarkLight1337 About the pytest marker:
Since we enable the AITER kernel tests by default. In this case, we don't need to disable AITER. This also reduces the need to add pytest marker or any form of decorators.

So, is it ok to keep it as follows?

... @pytest.mark.parametrize( "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) def test_models(hf_runner, vllm_runner, example_prompts, model: str, dtype: str, max_tokens: int, num_logprobs: int, use_rocm_aiter: bool, monkeypatch) -> None: if model in REQUIRES_V0 or current_platform.is_rocm(): monkeypatch.setenv("VLLM_USE_V1", "0") if use_rocm_aiter: monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") with hf_runner(model, dtype=dtype) as hf_model: if model.startswith("THUDM/chatglm3"): hf_model.model.get_output_embeddings = lambda: \ hf_model.model.transformer.output_layer hf_outputs = hf_model.generate_greedy_logprobs_limit( example_prompts, max_tokens, num_logprobs) with vllm_runner(model, dtype=dtype) as vllm_model: vllm_outputs = vllm_model.generate_greedy_logprobs( example_prompts, max_tokens, num_logprobs) check_logprobs_close( outputs_0_lst=hf_outputs, outputs_1_lst=vllm_outputs, name_0="hf", name_1="vllm", )

Similarly in #14967 (comment)

@SageMoore @DarkLight1337 I have also referenced your comments in other AITER PRs

We could enable AITER in AMD CI as we pinned AITER to a specific commit. (This avoids the need for pytest marker or decorator)

Enable the tests for the model that are actually using AITER kernels to avoid redundant models.

This is the final state of the test_models.py

# SPDX-License-Identifier: Apache-2.0 """Compare the outputs of HF and vLLM when using greedy sampling. Run `pytest tests/models/test_models.py`. """ import pytest + import torch from vllm.platforms import current_platform from ...utils import check_logprobs_close # These have unsupported head_dim for FA. We do not # not have a clean way to fall back, so we fail with # a clear msg when it happens. # https://github.com/vllm-project/vllm/issues/14524 REQUIRES_V0 = ["microsoft/phi-2", "stabilityai/stablelm-3b-4e1t"] + # This list contains the model that are using AITER kernel. + # Skip model that are not using AITER tests. + # When more AITER kernels are added, this list will not be + # needed as all the models will be calling AITER kernels + # in parts of the operators + AITER_MODEL_LIST = [ + "meta-llama/Llama-3.2-1B-Instruct", + "openbmb/MiniCPM3-4B", + "Qwen/Qwen-7B", + "Qwen/Qwen2.5-0.5B-Instruct", + "ehristoforu/Falcon3-MoE-2x7B-Insruct", + ] # @maybe_test_rocm_aiter @pytest.mark.parametrize( "model", [ pytest.param( "bigscience/bloom-560m", # bloom - testing alibi slopes marks=[pytest.mark.core_model, pytest.mark.cpu_model], ), pytest.param( "openai-community/gpt2", # gpt2 marks=[pytest.mark.core_model, pytest.mark.cpu_model], ), pytest.param("Milos/slovak-gpt-j-405M"), # gptj pytest.param("bigcode/tiny_starcoder_py"), # gpt_bigcode pytest.param("EleutherAI/pythia-70m"), # gpt_neox pytest.param( "google/gemma-1.1-2b-it", # gemma marks=[pytest.mark.core_model, pytest.mark.cpu_model], ), pytest.param( "THUDM/chatglm3-6b", # chatglm (text-only) ), pytest.param( "meta-llama/Llama-3.2-1B-Instruct", # llama marks=[pytest.mark.core_model, pytest.mark.cpu_model], ), pytest.param( "openbmb/MiniCPM3-4B", # fused_moe not supported on CPU marks=[pytest.mark.core_model], ), pytest.param( "facebook/opt-125m", # opt marks=[pytest.mark.core_model, pytest.mark.cpu_model], ), pytest.param( "microsoft/phi-2", # phi marks=[pytest.mark.core_model], ), pytest.param( "Qwen/Qwen-7B", # qwen (text-only) ), pytest.param( "Qwen/Qwen2.5-0.5B-Instruct", # qwen2 marks=[pytest.mark.core_model], ), pytest.param("stabilityai/stablelm-3b-4e1t"), # stablelm pytest.param("bigcode/starcoder2-3b"), # starcoder2 pytest.param( "ehristoforu/Falcon3-MoE-2x7B-Insruct", # mixtral marks=[pytest.mark.cpu_model], ) ]) @pytest.mark.parametrize("dtype", ["half"]) @pytest.mark.parametrize("max_tokens", [32]) @pytest.mark.parametrize("num_logprobs", [5]) + @pytest.mark.parametrize( + "use_rocm_aiter", [True, False] if current_platform.is_rocm() else [False]) def test_models(hf_runner, vllm_runner, example_prompts, model: str, dtype: str, max_tokens: int, num_logprobs: int, + use_rocm_aiter: bool, monkeypatch) -> None: if model in REQUIRES_V0: monkeypatch.setenv("VLLM_USE_V1", "0") + if use_rocm_aiter and (model in AITER_MODEL_LIST): + monkeypatch.setenv("VLLM_ROCM_USE_AITER", "1") + elif use_rocm_aiter and model not in AITER_MODEL_LIST: + # Skip model that are not using AITER tests. + # When more AITER kernels are added, this list will not be + # needed as all the models will be calling AITER kernels + # in parts of the operators + pytest.skip(f"Skipping '{model}' model test with AITER kernel.") with hf_runner(model, dtype=dtype) as hf_model: if model.startswith("THUDM/chatglm3"): hf_model.model.get_output_embeddings = lambda: \ hf_model.model.transformer.output_layer hf_outputs = hf_model.generate_greedy_logprobs_limit( example_prompts, max_tokens, num_logprobs) with vllm_runner(model, dtype=dtype) as vllm_model: vllm_outputs = vllm_model.generate_greedy_logprobs( example_prompts, max_tokens, num_logprobs) check_logprobs_close( outputs_0_lst=hf_outputs, outputs_1_lst=vllm_outputs, name_0="hf", name_1="vllm", ) + if use_rocm_aiter: + # this is to ensure that vllm engine + # has deallocated the memory before running the next + # unit tests. On ROCm, the memory might not be + # deallocated completely before running the + # next test case + torch.cuda.synchronize()

Signed-off-by: tjtanaa <[email protected]>

SageMoore

I requested a few changes, but otherwise looks reasonable. Thanks for breaking it out of the mono-PR!

vllm/platforms/interface.py

SageMoore · 2025-03-18T15:13:38Z

tests/models/decoder_only/language/test_models.py

+        if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true":
+            pytest.skip("Skipping test suite for ROCM AITER")


Why do we want to skip the test in CI? I generally agree, though. A pytest maker would be nicer than an environment variable.

SageMoore · 2025-03-18T15:39:39Z

vllm/model_executor/layers/layernorm.py

+def rocm_aiter_rmsnorm2d_fwd_with_add(
+        *, x: torch.Tensor, residual: torch.Tensor, weight: torch.Tensor,
+        variance_epsilon: float) -> Tuple[torch.Tensor, torch.Tensor]:
+    import aiter as rocm_aiter


Given that AITER isn't published on pypy yet, meaning users will either have to use the docker container or build from source, I'd like to have a nicer error message when users try to enable aiter without it being installed. There are a number of ways we can do this. I like the following but am open to other solutions.

def dispatch_cuda_rmsnorm_func( add_residual: bool ) -> Callable[..., Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]]]: if not add_residual: return rms_norm if current_platform.is_rocm_aiter_rmsnorm_enabled(): try: import aiter as rocm_aiter return rocm_aiter_rmsnorm2d_fwd_with_add except ImportError: logger.warn_once("AITER RMS Norm kernel is enabled, but AITER is not installed. Falling back to the default RMS Norm kernel") return fused_add_rms_norm return fused_add_rms_norm

@SageMoore

import aiter is conflicting with built-in function in python and try catching import aiter does not show whether aiter is installed. Unless we try to import a kernel function from aiter that if that kernel is not prebuild it would start building kernel JIT.

having a fallback makes it difficult to actually debug and ping pong performance differences. In addition, just a warning might be missed by users and complain about the performance, as user expect when AITER flag is set, AITER kernels are used.

So, we will avoid having a fallback here.

Signed-off-by: tjtanaa <[email protected]>

…ernorm dispatcher logic Signed-off-by: tjtanaa <[email protected]>

…ocation Signed-off-by: tjtanaa <[email protected]>

SageMoore

I think we're pretty close here. All of my comments are NITs

tests/model_executor/test_enabled_custom_ops.py

SageMoore · 2025-03-20T18:04:48Z

tests/models/decoder_only/language/test_models.py

        name_0="hf",
        name_1="vllm",
    )
+    if use_rocm_aiter:


Is this something we should generally be doing for ROCm or just when AITER is enabled?

Is this something we should generally be doing for ROCm or just when AITER is enabled?

Currently, it seems to be just when AITER enabled that this situation could occur.

@SageMoore
We have made the description clearer

# this is to ensure that vllm engine # has deallocated the memory before running the next + # unit tests. On ROCm, when using AITER + # the memory might not be deallocated completely + # before running the next test case torch.cuda.synchronize()

Good to know. Thanks!

vllm/platforms/interface.py

vllm/envs.py

Signed-off-by: tjtanaa <[email protected]>

SageMoore

Thanks for addressing all of my comments. This looks reasonable to me.

SageMoore · 2025-03-21T14:26:26Z

tests/models/decoder_only/language/test_models.py

        name_0="hf",
        name_1="vllm",
    )
+    if use_rocm_aiter:


Good to know. Thanks!

DarkLight1337

Stamp

…14959) Signed-off-by: tjtanaa <[email protected]>

…14959) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…14959) Signed-off-by: tjtanaa <[email protected]>

…14959) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Mu Huai <[email protected]>

tjtanaa added 2 commits March 17, 2025 10:21

add AITER layer norm support

e88fd89

Signed-off-by: tjtanaa <[email protected]>

reenable unit tests; update Dockerfile.rocm_base

4df7ed0

Signed-off-by: tjtanaa <[email protected]>

tjtanaa requested review from DarkLight1337 and ywang96 as code owners March 17, 2025 13:54

mergify bot added the ci/build label Mar 17, 2025

DarkLight1337 reviewed Mar 17, 2025

View reviewed changes

tjtanaa added 2 commits March 17, 2025 14:00

skip AITER during unit test

5f653f0

Signed-off-by: tjtanaa <[email protected]>

Merge remote-tracking branch 'origin/main' into aiter-rmsnorm

5b670cf

tjtanaa mentioned this pull request Mar 17, 2025

[Feature] [ROCm]: AITER Kernel Integration #14964

Open

61 tasks

SageMoore suggested changes Mar 18, 2025

View reviewed changes

SageMoore mentioned this pull request Mar 18, 2025

[FEAT][ROCm] Integrate Fused MoE Kernels from AITER #14967

Merged

tjtanaa added 6 commits March 19, 2025 08:39

refactor aiter unit test flags into decorator

6acb1bb

Signed-off-by: tjtanaa <[email protected]>

update run-amd-test.sh; fix skip rocm aiter test flag

44b8861

Signed-off-by: tjtanaa <[email protected]>

move the env variable into the layernorm

d899eea

Signed-off-by: tjtanaa <[email protected]>

Merge remote-tracking branch 'origin/main' into aiter-rmsnorm

d84e654

enable AITER in AMD CI; update unit tests related files; clear up lay…

785f753

…ernorm dispatcher logic Signed-off-by: tjtanaa <[email protected]>

update unit test to ensure in between test vllm engine complete deall…

8854c03

…ocation Signed-off-by: tjtanaa <[email protected]>

tjtanaa requested a review from SageMoore March 20, 2025 17:26

SageMoore reviewed Mar 20, 2025

View reviewed changes

tjtanaa added 2 commits March 21, 2025 01:10

clean up

64cc656

Signed-off-by: tjtanaa <[email protected]>

clean up

74c2d39

Signed-off-by: tjtanaa <[email protected]>

SageMoore approved these changes Mar 21, 2025

View reviewed changes

DarkLight1337 approved these changes Mar 21, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 21, 2025 14:54

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 21, 2025

vllm-bot merged commit ec870fb into vllm-project:main Mar 22, 2025
38 of 42 checks passed

tjtanaa deleted the aiter-rmsnorm branch March 22, 2025 10:46

erictang000 pushed a commit to erictang000/vllm that referenced this pull request Mar 25, 2025

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (vllm-project#…

78b53ab

…14959) Signed-off-by: tjtanaa <[email protected]>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (vllm-project#…

f455f76

…14959) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (vllm-project#…

87729f9

…14959) Signed-off-by: tjtanaa <[email protected]>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (vllm-project#…

528618b

…14959) Signed-off-by: tjtanaa <[email protected]>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature (vllm-project#…

1ba5c98

…14959) Signed-off-by: tjtanaa <[email protected]> Signed-off-by: Mu Huai <[email protected]>

vllmellm mentioned this pull request Aug 26, 2025

[Feature] [ROCm]: AITER Kernel Integration vllmellm/vllm#51

Open

61 tasks

		if os.getenv("SKIP_ROCM_ATIER_MODEL_TEST_CASES") == "true":
		pytest.skip("Skipping test suite for ROCM AITER")

Uh oh!

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959

[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature #14959

Uh oh!

Conversation

tjtanaa commented Mar 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RMS Norm Layer Implementation

Performance Improvements over Not using AITER kernel

Llama-3.1-8B-Instruct (with FP8 per-tensor dynamic quantization)

Llama-3.1-8B-Instruct-BF16

Llama-3.1-70B-Instruct (with FP8 per-tensor dynamic quantization)

Llama-3.1-70B-Instruct-BF16

Testing

Unit tests Status

Uh oh!

github-actions bot commented Mar 17, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjtanaa Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Mar 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SageMoore left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tjtanaa commented Mar 17, 2025 •

edited by github-actions bot

Loading

tjtanaa Mar 17, 2025 •

edited

Loading

DarkLight1337 Mar 17, 2025 •

edited

Loading