dynamic distpatch of fp8 kernels #14245

jeffdaily · 2025-03-05T00:12:21Z

This mostly affects ROCm with hardware that can support one or the other FP8 type. All fp8 kernels are now templated on fp8_type instead of assuming a single fp8_type via using FP8_TYPE = . CUDA is largely unaffected. For CUDA, fp8 kernels only instantiate the OCP type; no binary bloat. For ROCm, two kernel templates are instantiated, one for each fp8 type.

FP8_TYPE is removed.
All fp8 kernels have additional fp8_type template param.
The FP8_E4M3_MAX is replaced with templated struct helper fp8_e4m3_adjusted_max_v.
is_fp8_ocp() C++ function for host runtime query of preferred fp8 type.
fp8 kernel launches for ROCm get both OCP and FNUZ templated variants.
add python APIs current_platform.supports_fp8(), is_fp8_fnuz(), and fp8_dtype().
add VLLM_DISPATCH_FP8_TYPES that can nest with other dispatch macros

This mostly affects ROCm with hardware that can support one or the other FP8 type. All fp8 kernels are now templated on fp8_type instead of assuming a single fp8_type via `using FP8_TYPE = `. CUDA is largely unaffected. For CUDA, fp8 kernels only instantiate the OCP type; no binary bloat. For ROCm, two kernel templates are instantiated, one for each fp8 type. - FP8_TYPE is removed. - All fp8 kernels have additional fp8_type template param. - The FP8_E4M3_MAX is replaced with templated class FP8_E4M3_ADJUSTED_MAX. - is_fp8_ocp() C++ function for host runtime query of preferred fp8 type. - fp8 kernel launches for ROCm get both OCP and FNUZ templated variants. - add python APIs current_platform.supports_fp8() and is_fp8_fnuz() Signed-off-by: Jeff Daily <[email protected]>

github-actions · 2025-03-05T00:12:31Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-03-05T00:12:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeffdaily.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Jeff Daily <[email protected]>

ProExpertProg

Nice PR! Thanks for removing TODOs and cleaning up a variety of nested if statements etc.

I left a few comments for further improving the code quality - this is a pretty messy part of our codebase and we should really keep it from getting more convoluted - thanks for doing your part already!

benchmarks/kernels/benchmark_moe.py

csrc/dispatch_utils.h

csrc/layernorm_quant_kernels.cu

csrc/quantization/fp8/common.cuh

csrc/quantization/fused_kernels/quant_conversions.cuh

tests/kernels/test_triton_scaled_mm.py

csrc/quantization/fp8/common.cuh

Signed-off-by: Jeff Daily <[email protected]>

- FP8_E4M3_ADJUSTED_MAX -> fp8_e4m3_adjusted_max_v - struct fp8_e4m3_adjusted_max - ScaledQuant collapse duplicate fp8 definition Signed-off-by: Jeff Daily <[email protected]>

- added template-specialized utility function fp8::cvt_c10 - revert update by reference back to returning value Signed-off-by: Jeff Daily <[email protected]>

Signed-off-by: Jeff Daily <[email protected]>

jeffdaily · 2025-03-07T01:10:32Z

@ProExpertProg I really appreciate your thoughtful review. It greatly improved the quality.

ProExpertProg

Thanks for addressing the comments! It looks great now.

Added couple more nits, and I think it would be nice to add a comment on the C++ side somewhere that explains there are two FP8 types on ROCm (not sure where the best place for that is). Otherwise LGTM!

csrc/dispatch_utils.h

csrc/quantization/fp8/common.cuh

vllm/model_executor/layers/quantization/utils/fp8_utils.py

vllm/platforms/interface.py

Signed-off-by: Jeff Daily <[email protected]>

jeffdaily · 2025-03-07T21:29:25Z

@ProExpertProg Getting some cuda build failures:

error: identifier "fp8_e4m3_adjusted_max_v< ::c10::Float8_e4m3fn> " is undefined in device code

Looks like fp8_e4m3_adjusted_max_v is the problem, at least for nvcc. Any suggestions?

ProExpertProg · 2025-03-07T21:36:07Z

Maybe add C10_HOST_DEVICE back in, like the original constant?

try to fix fp8_e4m3_adjusted_max_v is undefined in device code on cuda Signed-off-by: Jeff Daily <[email protected]>

ProExpertProg · 2025-03-10T14:01:14Z

@jeffdaily it looks like you missed an import in vllm.model_executor.layers.quantization.utils.fp8_utils:

ImportError: cannot import name 'current_platform_fp8_dtype' from 'vllm.model_executor.layers.quantization.utils.fp8_utils' (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py)

Could you get that fixed, and we can enable the full CI?

Signed-off-by: Jeff Daily <[email protected]>

ProExpertProg

I think this should be good to go!

vllm/platforms/interface.py

robertgshaw2-redhat

This is an excellent PR

ProExpertProg · 2025-03-12T21:30:33Z

@jeffdaily I am planning to look into it more but do you know if this is a ROCm version issue? This PR is breaking the build on main

/mnt/nvme3n1p1/sage/git/nm-vllm/csrc/quantization/fp8/amd/quant_utils.cuh:25:33: error: use of undeclared identifier '__hip_fp8_e4m3'
   25 |       __hip_cvt_float_to_fp8(r, __hip_fp8_e4m3::__default_saturation,
      |                                 ^
/mnt/nvme3n1p1/sage/git/nm-vllm/csrc/quantization/fp8/amd/quant_utils.cuh:26:30: error: use of undeclared identifier '__hip_fp8_e4m3'
   26 |                              __hip_fp8_e4m3::__default_interpret),
      |                              ^

jeffdaily · 2025-03-12T21:37:18Z

Looks like a ROCm version issue. __hip_fp8_e4m3 and related types were added in ROCm 6.3. Not present in ROCm 6.2. I was testing in a ROCm 6.3 environment and assumed vllm CI was also at the latest ROCm version.

ProExpertProg · 2025-03-12T21:49:13Z

Could you help with the fix? I assume there's an alternative for ROCm 6.2? It's not the CI, it's a bug in a local build

jeffdaily · 2025-03-12T22:28:43Z

Working on a fix now. Can we forward-fix with a new PR or do you need to revert this one?

ProExpertProg · 2025-03-12T22:41:52Z

I think forward fixing is fine, the revert wouldn't be immediate either anyway. Unless you think the fix will be complicated?

jeffdaily · 2025-03-12T22:43:16Z

Fix shouldn't be complicated. Will need to isolate the use of __hip_fp8_e4m3 behind a ROCM_VERSION macro limiting it to ROCm 6.3 or newer.

ProExpertProg · 2025-03-12T22:47:54Z

Okay ping me on vLLM Slack once done so we can merge ASAP

jeffdaily · 2025-03-12T23:24:21Z

@ProExpertProg I'm not on slack yet. Here's the PR fix.

#14709

jeffdaily requested review from WoosukKwon, mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners March 5, 2025 00:12

mergify bot added the needs-rebase label Mar 5, 2025

Merge branch 'main' into is_fp8_fnuz

0bc4a2c

Signed-off-by: Jeff Daily <[email protected]>

jeffdaily force-pushed the is_fp8_fnuz branch from 9b2b720 to 0bc4a2c Compare March 5, 2025 00:28

mergify bot removed the needs-rebase label Mar 5, 2025

lint

d017db2

Signed-off-by: Jeff Daily <[email protected]>

ProExpertProg requested changes Mar 6, 2025

View reviewed changes

jeffdaily added 4 commits March 6, 2025 22:38

add current_platform.fp8_dtype()

3269946

Signed-off-by: Jeff Daily <[email protected]>

requested changes

1437267

- FP8_E4M3_ADJUSTED_MAX -> fp8_e4m3_adjusted_max_v - struct fp8_e4m3_adjusted_max - ScaledQuant collapse duplicate fp8 definition Signed-off-by: Jeff Daily <[email protected]>

more suggested changes

376449a

- added template-specialized utility function fp8::cvt_c10 - revert update by reference back to returning value Signed-off-by: Jeff Daily <[email protected]>

nested dispatch

5c3329d

Signed-off-by: Jeff Daily <[email protected]>

jeffdaily requested a review from ProExpertProg March 7, 2025 01:10

ProExpertProg approved these changes Mar 7, 2025

View reviewed changes

tlrmchlsmth reviewed Mar 7, 2025

View reviewed changes

vllm/platforms/interface.py Show resolved Hide resolved

additional reviewer requests

27395bb

Signed-off-by: Jeff Daily <[email protected]>

hongxiayang added the rocm Related to AMD ROCm label Mar 7, 2025

conditionally add back C10_HOST_DEVICE for fp8 max

1cfa515

try to fix fp8_e4m3_adjusted_max_v is undefined in device code on cuda Signed-off-by: Jeff Daily <[email protected]>

robertgshaw2-redhat added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Mar 10, 2025

revert removal of current_platform_fp8_dtype

b90114a

Signed-off-by: Jeff Daily <[email protected]>

mergify bot removed the needs-rebase label Mar 10, 2025

ProExpertProg approved these changes Mar 11, 2025

View reviewed changes

vllm/platforms/interface.py Show resolved Hide resolved

robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025

robertgshaw2-redhat approved these changes Mar 11, 2025

View reviewed changes

tlrmchlsmth approved these changes Mar 11, 2025

View reviewed changes

robertgshaw2-redhat merged commit a1c8f37 into vllm-project:main Mar 11, 2025
69 checks passed

jeffdaily mentioned this pull request Mar 11, 2025

[Feature]: replace current_platform.has_device_capability() with specific feature APIs #14637

Closed

1 task

gshtras mentioned this pull request Mar 12, 2025

[ROCm][FP8] Fix for adjustments needed only for fnuz #14689

Merged

jeffdaily mentioned this pull request Mar 12, 2025

forward fix PR 14245, restore build on ROCm 6.2 #14709

Merged

ProExpertProg mentioned this pull request Mar 18, 2025

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm #15050

Merged

rasmith mentioned this pull request Apr 6, 2025

[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel #12591

Merged

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

hl475 mentioned this pull request Sep 19, 2025

Fix AMD build issues #25280

Closed

5 tasks

Uh oh!

dynamic distpatch of fp8 kernels #14245

dynamic distpatch of fp8 kernels #14245

Uh oh!

Conversation

jeffdaily commented Mar 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

mergify bot commented Mar 5, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffdaily commented Mar 7, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeffdaily commented Mar 7, 2025

Uh oh!

ProExpertProg commented Mar 7, 2025

Uh oh!

ProExpertProg commented Mar 10, 2025

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robertgshaw2-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ProExpertProg commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffdaily commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ProExpertProg commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffdaily commented Mar 12, 2025

Uh oh!

ProExpertProg commented Mar 12, 2025

Uh oh!

jeffdaily commented Mar 12, 2025

Uh oh!

ProExpertProg commented Mar 12, 2025

Uh oh!

jeffdaily commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jeffdaily commented Mar 5, 2025 •

edited by github-actions bot

Loading

ProExpertProg commented Mar 12, 2025 •

edited

Loading

jeffdaily commented Mar 12, 2025 •

edited

Loading

ProExpertProg commented Mar 12, 2025 •

edited

Loading