Skip to content

Conversation

@prashanth058
Copy link
Contributor

@prashanth058 prashanth058 commented Nov 19, 2025

Issue:
LoRA-wrapped RowParallelLinear was adding bias as a separate bfloat16 operation instead of fusing it into the GEMM kernel like the unwrapped layer does. This caused precision loss because the fused kernel can accumulate in higher precision (FP32) before converting to bfloat16, while separate addition incurs additional rounding errors. The discrepancy appeared even with zero LoRA weights when comparing LoRA-wrapped vs merged weight results.

Fix:
Pass bias to apply() only on rank 0 (or when skip_bias_add=False), allowing the quantization method to fuse bias addition with matrix multiplication in the GEMM kernel. This matches the unwrapped layer's behavior and eliminates precision discrepancies.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a precision loss issue in LoRA-wrapped RowParallelLinear by fusing the bias addition into the GEMM kernel, which aligns its behavior with the non-LoRA equivalent layer. The changes correctly pass the bias to the apply method only on rank 0 to prevent redundant additions in tensor-parallel setups, and the refactoring of the bias handling logic improves code clarity. The fix appears correct and well-implemented. I have no major concerns with this change.

@jeejeelee
Copy link
Collaborator

over LGTM, could you please address CI failure first?

@prashanth058 prashanth058 force-pushed the fix/lora-bias-precision branch from b58afac to 58e30d4 Compare November 19, 2025 15:40
@jeejeelee jeejeelee enabled auto-merge (squash) November 20, 2025 01:30
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2025
@jeejeelee jeejeelee merged commit 0cca9b4 into vllm-project:main Nov 20, 2025
48 checks passed
LuminolT pushed a commit to LuminolT/vllm that referenced this pull request Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants