-
-
Couldn't load subscription status.
- Fork 10.8k
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes
#15308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
Please take a look @jeejeelee. Thanks ! 🙌 |
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the test modifications, overall LGTM
Signed-off-by: Varun Sundar Rabindranath <[email protected]>
… capture sizes (vllm-project#15308) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
… capture sizes (vllm-project#15308) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
… capture sizes (vllm-project#15308) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
… capture sizes (vllm-project#15308) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]>
… capture sizes (vllm-project#15308) Signed-off-by: Varun Sundar Rabindranath <[email protected]> Co-authored-by: Varun Sundar Rabindranath <[email protected]> Signed-off-by: Mu Huai <[email protected]>
FIX #15269
Repro command :
Bug:
PR that introduced the bug: #14685
Affects : V0 engine. V1 engine is fine.
Fix / Why ? :
In V0, during cudagraph capture, the cuda graph capture size could be greater than the
max_num_seqssetting. In #14685 , we assume thatmax_num_seqswill always be respected. This assumption is true for V1, but not for V0.The line where the error occurs deals with the LogitsProcessor. Before #14685,
_sampler_indicesatvllm/vllm/lora/punica_wrapper/punica_base.py
Line 136 in cfbb8c9
token_lora_mapping. Before #14685 we seem to have handled this issue by just allocating a buffer as big asmax_num_batched_tokens. We use the same fix here.