[Core] [Bugfix] Add Input Embeddings #15428

qthequartermasterman · 2025-03-25T02:01:32Z

Note

This PR is just #11684, but rebased onto main and then with pre-commit errors fixed, since it has been some time since @Bryce1010 has updated that PR.

Adds support for passing prompt_embeds to LLM.generate as

llm.generate({"prompt_embeds": input_embeds}, sampling_params)

or

llm.generate(
    [{"prompt_embeds": input_embeds} for input_embeds in inputs_embeds], sampling_params
)

this enables use cases when only the embedding layer is finetuned, and have the same model backend support multiple custom tuned embedding layers

FIX #416
FIX #8323
FIX #14621

github-actions · 2025-03-25T02:01:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

liangwythu · 2025-03-25T03:44:50Z

our project need this feature，Would love to see this one merged!

vllm/entrypoints/llm.py

vllm/worker/model_runner.py

DarkLight1337 · 2025-03-25T04:44:21Z

Can you also update this PR with the unit tests in #6869 to see whether this solution works correctly?

vllm/model_executor/models/qwen2.py

DarkLight1337 · 2025-03-26T03:23:32Z

Oops I accidentally closed the PR, reopened it now

tests/worker/test_model_runner.py

yukang2017 · 2025-03-27T03:51:42Z

Hi,

I set inputs_embeds as (num_tokens, embed_dim) and get the following issues. Are there any advice? Thanks.

vllm/inputs/preprocess.py

lzl-mt · 2025-03-27T12:02:08Z

Hi,

I set inputs_embeds as (num_tokens, embed_dim) and get the following issues. Are there any advice? Thanks.

@DarkLight1337 I also meet this issue. When my input is (T, V), I get the error:
*** IndexError: index 14 is out of bounds for axis 0 with size 14

When my input is (B, T, V), I get the error:
RuntimeError: query, key, and positions must have the same number of tokens.

mergify · 2025-04-01T08:20:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @qthequartermasterman.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Andrew Sansom <[email protected]>

… models Signed-off-by: Andrew Sansom <[email protected]>

… a sample outputs Signed-off-by: Andrew Sansom <[email protected]>

qthequartermasterman · 2025-05-02T03:51:10Z

@DarkLight1337

There are three tests still failing in CI.

Async Engine Inputs: test_mp_crash_detection has a subtle bug in it. It is supposed to be testing how long error recovery takes after LLMEngine failes to initialize. But since LLMEngine is being initialized in a different process, the mock which raises an exception is not applied properly. So the actual initialization process is what is being timed here. Because there are more cudagraphs to compile, this takes longer than the 60 seconds defined in the test. The "easy" answer is to bump up the threshold in the test. Another "easy" option is to enforce eager mode to avoid compilation. But the "correct" answer is to refactor the test to cause a failure within the LLMEngine so it actually tests what it claims to be testing. Using mocks directly would be difficult because of the fact that LLMEngine is being initialized in another process. I'm not 100% certain how to do it cleanly, to be honest, without a refactor of the source code for the MQLLMEngine and/or the build_async_engine_client_from_engine_args to be able to pass in a function that raises an error (maybe a fake executor_class?)

How would you like to proceed with this?

Both the speculative decoding and v1 tests are failing on main locally for me. 🤷 I see the same v1-test is failing on main an hour ago: https://buildkite.com/vllm/ci/builds/19179#01968e82-8601-44f0-8825-816c8f31a236. The speculative decoding tests seem not to have run in that build, so I don't know for certain if they're failing on main in CI. For the life of me I cannot figure out how those speculative decoding tests would have been affected by any of the changes in this PR.

DarkLight1337 · 2025-05-02T05:14:36Z

I can help force merge if the test failures are unrelated

Signed-off-by: DarkLight1337 <[email protected]>

KyleMylonakisProtopia · 2025-05-02T08:41:55Z

Super happy to see this in. Thank you everyone for all the hard work!

qthequartermasterman · 2025-05-02T11:30:29Z

Thanks @DarkLight1337 for the patient review and iteration process!

Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

njhill · 2025-05-03T04:23:06Z

I'm not sure we want to double the graph compilation time during startup in all cases, even when inputs_embeds aren't being used. This is the reason for at least one of the test failures and likely others too.

DarkLight1337 · 2025-05-03T05:17:39Z

My bad for merging this. I have opened #17607 to fix this problem by disabling input embeddings by default.

gshtras · 2025-05-06T15:02:59Z

vllm/engine/llm_engine.py

        if not prompt_ids:
            if prompt_type == "encoder" and model_config.is_multimodal_model:
                pass  # Mllama may have empty encoder inputs for text-only data
+            if prompt_inputs["type"] == "embeds":


Should be elif? The way it is now it'll raise on mllama with empty encoder input

Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>

Signed-off-by: Andrew Sansom <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: 临景 <[email protected]> Co-authored-by: Bryce1010 <[email protected]> Co-authored-by: Nan2018 <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: DarkLight1337 <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

qthequartermasterman requested review from alexm-redhat, comaniac, njhill, youkaichao and zhuohan123 as code owners March 25, 2025 02:01

mergify bot added the frontend label Mar 25, 2025

qthequartermasterman force-pushed the feature/vllm/add-input-embedding branch from bbd8529 to fa037a6 Compare March 25, 2025 02:05

qthequartermasterman mentioned this pull request Mar 25, 2025

[Bugfix] add input embedding #11684

Closed

DarkLight1337 self-assigned this Mar 25, 2025

DarkLight1337 reviewed Mar 25, 2025

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

vllm/worker/model_runner.py Outdated Show resolved Hide resolved

qthequartermasterman requested a review from ywang96 as a code owner March 25, 2025 22:11

qthequartermasterman commented Mar 25, 2025

View reviewed changes

vllm/model_executor/models/qwen2.py Outdated Show resolved Hide resolved

qthequartermasterman force-pushed the feature/vllm/add-input-embedding branch from 5024f08 to 01cbc0b Compare March 25, 2025 22:16

DarkLight1337 closed this Mar 26, 2025

DarkLight1337 reopened this Mar 26, 2025

DarkLight1337 reviewed Mar 26, 2025

View reviewed changes

tests/worker/test_model_runner.py Outdated Show resolved Hide resolved

lzl-mt reviewed Mar 27, 2025

View reviewed changes

vllm/inputs/preprocess.py Outdated Show resolved Hide resolved

mergify bot added the needs-rebase label Apr 1, 2025

临景 and others added 5 commits April 2, 2025 14:37

(vllm) add input embedding

cef6894

Signed-off-by: Andrew Sansom <[email protected]>

improve embedding input

c51d8fb

Signed-off-by: Andrew Sansom <[email protected]>

(vllm) fix import error

9564b40

Signed-off-by: Andrew Sansom <[email protected]>

(vllm) fix pre commit error

c60298a

Signed-off-by: Andrew Sansom <[email protected]>

apply ruff and isort fixes

0c24a82

Signed-off-by: Andrew Sansom <[email protected]>

qthequartermasterman added 5 commits May 1, 2025 15:35

Merge branch 'main' into feature/vllm/add-input-embedding

cb6ff22

Signed-off-by: Andrew Sansom <[email protected]>

support encoder decoder models with inputs_embeds

85642d0

Signed-off-by: Andrew Sansom <[email protected]>

simplify redundant ternary statement

b226fd6

Signed-off-by: Andrew Sansom <[email protected]>

explicitly remove support for inputs embeds with speculative decoding…

b738d3f

… models Signed-off-by: Andrew Sansom <[email protected]>

fix occasional device mismatch errors when appending output tokens to…

2340119

… a sample outputs Signed-off-by: Andrew Sansom <[email protected]>

mergify bot removed the needs-rebase label May 1, 2025

Merge branch 'main' into feature/vllm/add-input-embedding

92b3264

Merge branch 'main' into feature/vllm/add-input-embedding

b9271c1

Fix a typo

28b0983

Signed-off-by: DarkLight1337 <[email protected]>

vllm-bot merged commit cc2a77d into vllm-project:main May 2, 2025
3 of 6 checks passed

DarkLight1337 mentioned this pull request May 2, 2025

[Misc] Clean up input processing #17582

Merged

CandiedCode mentioned this pull request May 2, 2025

Feature/vllm/input embedding completion api #17590

Merged

njhill added the v0 label May 3, 2025

DarkLight1337 mentioned this pull request May 3, 2025

[Core] Gate prompt_embeds behind a feature flag #17607

Merged

gshtras reviewed May 6, 2025

View reviewed changes

gshtras mentioned this pull request May 6, 2025

[Bugfix] Fix for the condition to accept empty encoder inputs for mllama #17732

Merged

roadr mentioned this pull request Jun 19, 2025

[Usage]: embed prompts #19746

Closed

1 task

qthequartermasterman mentioned this pull request Aug 2, 2025

[RFC]: Prompt Embeddings Support in v1 Engine #22124

Closed

1 task

qthequartermasterman mentioned this pull request Sep 4, 2025

[CORE] Prompt Embeddings Support for v1 Engine #24278

Merged

5 tasks

Uh oh!

[Core] [Bugfix] Add Input Embeddings #15428

[Core] [Bugfix] Add Input Embeddings #15428

Uh oh!

Conversation

qthequartermasterman commented Mar 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 25, 2025

Uh oh!

liangwythu commented Mar 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Mar 25, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Mar 26, 2025

Uh oh!

Uh oh!

yukang2017 commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lzl-mt commented Mar 27, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

qthequartermasterman commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented May 2, 2025

Uh oh!

Uh oh!

KyleMylonakisProtopia commented May 2, 2025

Uh oh!

qthequartermasterman commented May 2, 2025

Uh oh!

njhill commented May 3, 2025

Uh oh!

DarkLight1337 commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gshtras May 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

qthequartermasterman commented Mar 25, 2025 •

edited by github-actions bot

Loading

yukang2017 commented Mar 27, 2025 •

edited

Loading

qthequartermasterman commented May 2, 2025 •

edited

Loading

DarkLight1337 commented May 3, 2025 •

edited

Loading