Skip to content

Conversation

@jeffdaily
Copy link
Contributor

@jeffdaily jeffdaily commented Mar 5, 2025

This mostly affects ROCm with hardware that can support one or the other FP8 type. All fp8 kernels are now templated on fp8_type instead of assuming a single fp8_type via using FP8_TYPE = . CUDA is largely unaffected. For CUDA, fp8 kernels only instantiate the OCP type; no binary bloat. For ROCm, two kernel templates are instantiated, one for each fp8 type.

  • FP8_TYPE is removed.
  • All fp8 kernels have additional fp8_type template param.
  • The FP8_E4M3_MAX is replaced with templated struct helper fp8_e4m3_adjusted_max_v.
  • is_fp8_ocp() C++ function for host runtime query of preferred fp8 type.
  • fp8 kernel launches for ROCm get both OCP and FNUZ templated variants.
  • add python APIs current_platform.supports_fp8(), is_fp8_fnuz(), and fp8_dtype().
  • add VLLM_DISPATCH_FP8_TYPES that can nest with other dispatch macros

This mostly affects ROCm with hardware that can support one or the other
FP8 type. All fp8 kernels are now templated on fp8_type instead of
assuming a single fp8_type via `using FP8_TYPE = `. CUDA is largely
unaffected.  For CUDA, fp8 kernels only instantiate the OCP type; no
binary bloat. For ROCm, two kernel templates are instantiated, one for
each fp8 type.

- FP8_TYPE is removed.
- All fp8 kernels have additional fp8_type template param.
- The FP8_E4M3_MAX is replaced with templated class FP8_E4M3_ADJUSTED_MAX.
- is_fp8_ocp() C++ function for host runtime query of preferred fp8 type.
- fp8 kernel launches for ROCm get both OCP and FNUZ templated variants.
- add python APIs current_platform.supports_fp8() and is_fp8_fnuz()

Signed-off-by: Jeff Daily <[email protected]>
@github-actions
Copy link

github-actions bot commented Mar 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify
Copy link

mergify bot commented Mar 5, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeffdaily.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Mar 5, 2025
Signed-off-by: Jeff Daily <[email protected]>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Thanks for removing TODOs and cleaning up a variety of nested if statements etc.

I left a few comments for further improving the code quality - this is a pretty messy part of our codebase and we should really keep it from getting more convoluted - thanks for doing your part already!

- FP8_E4M3_ADJUSTED_MAX -> fp8_e4m3_adjusted_max_v
- struct fp8_e4m3_adjusted_max
- ScaledQuant collapse duplicate fp8 definition

Signed-off-by: Jeff Daily <[email protected]>
- added template-specialized utility function fp8::cvt_c10
- revert update by reference back to returning value

Signed-off-by: Jeff Daily <[email protected]>
Signed-off-by: Jeff Daily <[email protected]>
@jeffdaily
Copy link
Contributor Author

@ProExpertProg I really appreciate your thoughtful review. It greatly improved the quality.

@jeffdaily jeffdaily requested a review from ProExpertProg March 7, 2025 01:10
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments! It looks great now.

Added couple more nits, and I think it would be nice to add a comment on the C++ side somewhere that explains there are two FP8 types on ROCm (not sure where the best place for that is). Otherwise LGTM!

@jeffdaily
Copy link
Contributor Author

@ProExpertProg Getting some cuda build failures:

error: identifier "fp8_e4m3_adjusted_max_v< ::c10::Float8_e4m3fn> " is undefined in device code

Looks like fp8_e4m3_adjusted_max_v is the problem, at least for nvcc. Any suggestions?

@ProExpertProg
Copy link
Collaborator

Maybe add C10_HOST_DEVICE back in, like the original constant?

@hongxiayang hongxiayang added the rocm Related to AMD ROCm label Mar 7, 2025
try to fix fp8_e4m3_adjusted_max_v is undefined in device code on cuda

Signed-off-by: Jeff Daily <[email protected]>
@ProExpertProg
Copy link
Collaborator

@jeffdaily it looks like you missed an import in vllm.model_executor.layers.quantization.utils.fp8_utils:

ImportError: cannot import name 'current_platform_fp8_dtype' from 'vllm.model_executor.layers.quantization.utils.fp8_utils' (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py)

Could you get that fixed, and we can enable the full CI?

@robertgshaw2-redhat robertgshaw2-redhat added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Mar 10, 2025
@mergify mergify bot removed the needs-rebase label Mar 10, 2025
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be good to go!

@robertgshaw2-redhat robertgshaw2-redhat added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 11, 2025
Copy link
Collaborator

@robertgshaw2-redhat robertgshaw2-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent PR

@robertgshaw2-redhat robertgshaw2-redhat merged commit a1c8f37 into vllm-project:main Mar 11, 2025
69 checks passed
@ProExpertProg
Copy link
Collaborator

ProExpertProg commented Mar 12, 2025

@jeffdaily I am planning to look into it more but do you know if this is a ROCm version issue? This PR is breaking the build on main

/mnt/nvme3n1p1/sage/git/nm-vllm/csrc/quantization/fp8/amd/quant_utils.cuh:25:33: error: use of undeclared identifier '__hip_fp8_e4m3'
   25 |       __hip_cvt_float_to_fp8(r, __hip_fp8_e4m3::__default_saturation,
      |                                 ^
/mnt/nvme3n1p1/sage/git/nm-vllm/csrc/quantization/fp8/amd/quant_utils.cuh:26:30: error: use of undeclared identifier '__hip_fp8_e4m3'
   26 |                              __hip_fp8_e4m3::__default_interpret),
      |                              ^

@jeffdaily
Copy link
Contributor Author

jeffdaily commented Mar 12, 2025

Looks like a ROCm version issue. __hip_fp8_e4m3 and related types were added in ROCm 6.3. Not present in ROCm 6.2. I was testing in a ROCm 6.3 environment and assumed vllm CI was also at the latest ROCm version.

@ProExpertProg
Copy link
Collaborator

ProExpertProg commented Mar 12, 2025

Could you help with the fix? I assume there's an alternative for ROCm 6.2? It's not the CI, it's a bug in a local build

@jeffdaily
Copy link
Contributor Author

Working on a fix now. Can we forward-fix with a new PR or do you need to revert this one?

@ProExpertProg
Copy link
Collaborator

I think forward fixing is fine, the revert wouldn't be immediate either anyway. Unless you think the fix will be complicated?

@jeffdaily
Copy link
Contributor Author

Fix shouldn't be complicated. Will need to isolate the use of __hip_fp8_e4m3 behind a ROCM_VERSION macro limiting it to ROCm 6.3 or newer.

@ProExpertProg
Copy link
Collaborator

Okay ping me on vLLM Slack once done so we can merge ASAP

@jeffdaily
Copy link
Contributor Author

@ProExpertProg I'm not on slack yet. Here's the PR fix.

#14709

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed rocm Related to AMD ROCm v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants