-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Bug] Enforce contiguous input for dynamic_scaled_fp8_quant
and static_scaled_fp8_quant
#21773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Enforce contiguous input for dynamic_scaled_fp8_quant
and static_scaled_fp8_quant
#21773
Conversation
Signed-off-by: yewentao256 <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a RuntimeError
that occurs when non-contiguous tensors are passed to the dynamic_scaled_fp8_quant
and static_scaled_fp8_quant
custom operators. The fix correctly enforces input contiguity by calling .contiguous()
on the input tensor before it is passed to the underlying C++ kernels.
The change is a direct and effective solution to the bug described, and it aligns with the existing pattern for other custom operator calls within the same function. The pull request description clearly communicates that this is an intentional, temporary measure to ensure stability, with a more performant, long-term solution tracked in a separate issue. The code is clean, localized, and I found no issues of high
or critical
severity. The pull request is ready to be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm curious what the strides actually are. I think the kernels should be able to support non-contiguous inputs as-is if input.stride(-1) == 1
Yeah, I think I will look at #19630 later to see if we can support this case using scalar fallback |
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]> Signed-off-by: x22x22 <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Paul Pak <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]> Signed-off-by: Diego-Castan <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]>
…atic_scaled_fp8_quant` (vllm-project#21773) Signed-off-by: yewentao256 <[email protected]>
Purpose
Similar to #19452, we temporally fix by enforcing input contiguous. Will work on #19630 later to see we can support some non-contiguous input case.
lm_eval --model vllm --model_args "pretrained=nm-testing/DeepSeek-Coder-V2-Lite-Instruct-FP8,max_model_len=32768,enable_expert_parallel=True,enforce_eager=True" --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto
Test