-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Core] [Bugfix] Add Input Embeddings #15428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] [Bugfix] Add Input Embeddings #15428
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
bbd8529
to
fa037a6
Compare
our project need this feature,Would love to see this one merged! |
Can you also update this PR with the unit tests in #6869 to see whether this solution works correctly? |
5024f08
to
01cbc0b
Compare
Oops I accidentally closed the PR, reopened it now |
@DarkLight1337 I also meet this issue. When my input is (T, V), I get the error: When my input is (B, T, V), I get the error: |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]>
… models Signed-off-by: Andrew Sansom <[email protected]>
… a sample outputs Signed-off-by: Andrew Sansom <[email protected]>
There are three tests still failing in CI. Async Engine Inputs: How would you like to proceed with this? Both the speculative decoding and v1 tests are failing on main locally for me. 🤷 I see the same v1-test is failing on main an hour ago: https://buildkite.com/vllm/ci/builds/19179#01968e82-8601-44f0-8825-816c8f31a236. The speculative decoding tests seem not to have run in that build, so I don't know for certain if they're failing on main in CI. For the life of me I cannot figure out how those speculative decoding tests would have been affected by any of the changes in this PR. |
I can help force merge if the test failures are unrelated |
Signed-off-by: DarkLight1337 <[email protected]>
Super happy to see this in. Thank you everyone for all the hard work! |
Thanks @DarkLight1337 for the patient review and iteration process! |
Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
I'm not sure we want to double the graph compilation time during startup in all cases, even when inputs_embeds aren't being used. This is the reason for at least one of the test failures and likely others too. |
My bad for merging this. I have opened #17607 to fix this problem by disabling input embeddings by default. |
if not prompt_ids: | ||
if prompt_type == "encoder" and model_config.is_multimodal_model: | ||
pass # Mllama may have empty encoder inputs for text-only data | ||
if prompt_inputs["type"] == "embeds": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be elif? The way it is now it'll raise on mllama with empty encoder input
Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
Note
This PR is just #11684, but rebased onto main and then with pre-commit errors fixed, since it has been some time since @Bryce1010 has updated that PR.
Adds support for passing prompt_embeds to LLM.generate as
or
this enables use cases when only the embedding layer is finetuned, and have the same model backend support multiple custom tuned embedding layers
FIX #416
FIX #8323
FIX #14621