-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Bugfix] handle alignment of encoder_seq_lens in mllama.py #14784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] handle alignment of encoder_seq_lens in mllama.py #14784
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
d289902
to
8e6903c
Compare
8e6903c
to
fb1d347
Compare
vllm/model_executor/models/mllama.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_processor_for_mllama
has been removed and I don't see our use of the Transformers processor doing this same trick to only process the last image group.
I'm working to undersatnd this better, but I think we either should re-implement the "cheat" or remove the extra complexity (i.e. this check comparing to num tiles and remove kv_range_for_decode
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm checking with the author of PR #11427 which removes the previous input processor. See the discussion here.
https://vllm-dev.slack.com/archives/C07QCGVDNUF/p1743001000770969
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cheat is implementing here #15564
This pull request has merge conflicts that must be resolved before it can be |
fb1d347
to
582552e
Compare
582552e
to
5b257d9
Compare
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
Signed-off-by: Travis Johnson <[email protected]>
5b257d9
to
aa6d40d
Compare
@heheda12345 I rebased and updated the branch. Ready for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. Only some nits.
Signed-off-by: Travis Johnson <[email protected]>
@heheda12345 Thanks for the review! I pushed the suggested changes (the linter didn't like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the bug fix!
…ect#14784) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Yang Wang <[email protected]>
…ect#14784) Signed-off-by: Travis Johnson <[email protected]>
…ect#14784) Signed-off-by: Travis Johnson <[email protected]>
…ect#14784) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Fix for the crash repro reported in this comment. The bug and fix are pretty similar to #12347 in that the problem arises in a batch of mixed text and image requests leading to the attn metadata having some lists with an element for each sequence and others with an element only for each sequence-with-images.
Another fix that came out of #10648
FIX #10648