[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

prashanth058 · 2025-11-19T00:17:19Z

Issue:
LoRA-wrapped RowParallelLinear was adding bias as a separate bfloat16 operation instead of fusing it into the GEMM kernel like the unwrapped layer does. This caused precision loss because the fused kernel can accumulate in higher precision (FP32) before converting to bfloat16, while separate addition incurs additional rounding errors. The discrepancy appeared even with zero LoRA weights when comparing LoRA-wrapped vs merged weight results.

Fix:
Pass bias to apply() only on rank 0 (or when skip_bias_add=False), allowing the quantization method to fuse bias addition with matrix multiplication in the GEMM kernel. This matches the unwrapped layer's behavior and eliminates precision discrepancies.

gemini-code-assist

Code Review

This pull request addresses a precision loss issue in LoRA-wrapped RowParallelLinear by fusing the bias addition into the GEMM kernel, which aligns its behavior with the non-LoRA equivalent layer. The changes correctly pass the bias to the apply method only on rank 0 to prevent redundant additions in tensor-parallel setups, and the refactoring of the bias handling logic improves code clarity. The fix appears correct and well-implemented. I have no major concerns with this change.

jeejeelee · 2025-11-19T01:36:20Z

over LGTM, could you please address CI failure first?

…ng bias into GEMM Signed-off-by: prashanth058 <[email protected]>

…ng bias into GEMM (vllm-project#28972) Signed-off-by: prashanth058 <[email protected]> Signed-off-by: LuminolT <[email protected]>

prashanth058 requested a review from jeejeelee as a code owner November 19, 2025 00:17

gemini-code-assist bot reviewed Nov 19, 2025

View reviewed changes

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusi…

58e30d4

…ng bias into GEMM Signed-off-by: prashanth058 <[email protected]>

prashanth058 force-pushed the fix/lora-bias-precision branch from b58afac to 58e30d4 Compare November 19, 2025 15:40

jeejeelee approved these changes Nov 20, 2025

View reviewed changes

jeejeelee enabled auto-merge (squash) November 20, 2025 01:30

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025

jeejeelee merged commit 0cca9b4 into vllm-project:main Nov 20, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

Uh oh!

prashanth058 commented Nov 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

jeejeelee commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

[Bugfix] Fix precision loss in LoRA-wrapped RowParallelLinear by fusing bias into GEMM #28972

Uh oh!

Conversation

prashanth058 commented Nov 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jeejeelee commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prashanth058 commented Nov 19, 2025 •

edited by github-actions bot

Loading