Skip to content

Conversation

WoosukKwon
Copy link
Collaborator

This PR adds the single_query_cached_kv_attention kernel.

Supported data types:

  • half
  • float

Tested models:

  • OPT-125M
  • OPT-350M
  • OPT-1.3B
  • OPT-2.7B
  • OPT-6.7B
  • OPT-13B

Tested GPUs:

  • A100

@WoosukKwon WoosukKwon merged commit 0deacbc into main Mar 1, 2023
@WoosukKwon WoosukKwon deleted the attention-kernel branch March 1, 2023 23:11
v1nc3nt27 pushed a commit to v1nc3nt27/vllm that referenced this pull request Sep 12, 2023
xiangyuT pushed a commit to xiangyuT/vllm that referenced this pull request Oct 18, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Mar 12, 2024
Passing alibi_slopes and sliding_window to PagedAttention extension
luo-cheng2021 pushed a commit to luo-cheng2021/vllm that referenced this pull request Mar 20, 2024
daniel-geon-park added a commit to gmlwns2000/vllm-timber that referenced this pull request Apr 15, 2024
mzusman added a commit to mzusman/vllm that referenced this pull request Apr 16, 2024
* Remove assertion

* adapting jamba vllm to changes after hf release, working on weight loading in modeling file

* splitting the JambaDecoderLayer to JambaMambaDecoderLayer and JambaAttentionDecoderLayer

* weight loading from hf checkpoint supposedly works, might be a mixup in the MoE between the gated and non-gated weights

* Add mamba from jamba modeling file

* Remove slow forward

* Modifications to mamba_mixer

* Save changes, WIP

* Fix cache placement

* Debugging

* Additions and logging

* Jamba with mamba cache handling

* Clean up

* Another cleanup

* Use vllm's RMSNorm instead of JambaRMSNorm, Thier implementation is with
fused kernel

* Clean up and orginization of the objects to handle the mamba cache

* Shorten the code for kv cache mem

* Move cache handling inside the Mixer

* Add mamba to the wheel requirements

* Add mamba to the requirements script

* Add mamba_metadata

* Add to __init__ __all__

* Revert 2 commits

ad1a3db 'Add mamba to the requirements script'
75ed2c8 'Add mamba to the wheel requirements'

* Clean up

* Naming

* Apply whitespace suggestions from code review

* pass tie_word_embeddings to PretrainedConfig init

* Replace repeat with expand as expand doesn't require more mem

* Allocate really small cache if needed , don't use meta

* Fix for expanded

---------

Co-authored-by: Mor Zusman <[email protected]>
Co-authored-by: Erez Schwartz <[email protected]>
Co-authored-by: tomeras91 <[email protected]>
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request May 6, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <[email protected]>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <[email protected]>

* [add] extra information about evns

Signed-off-by: ApostaC <[email protected]>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Updates

Signed-off-by: Tyler Michael Smith <[email protected]>

* Rs branch (vllm-project#3)

* updated

Signed-off-by: [email protected] <[email protected]>

* Rs branch (vllm-project#5)

Signed-off-by: [email protected] <[email protected]>

* Remove Unneeded Arguments (vllm-project#7)

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* cleanup

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Improve disagg-example.sh (vllm-project#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* added connector

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update

Signed-off-by: [email protected] <[email protected]>

* remove

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* seems to load properly

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* added

Signed-off-by: [email protected] <[email protected]>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* updaed

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated on scheduler side

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Hacking away

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Robert Shaw <[email protected]>

* ensure request removed from running list

Signed-off-by: Robert Shaw <[email protected]>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* rename files

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <[email protected]>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <[email protected]>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* justfile edits

Signed-off-by: Tyler Michael Smith <[email protected]>

* Update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <[email protected]>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <[email protected]>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated (vllm-project#12)

Signed-off-by: [email protected] <[email protected]>

* Add Accuracy Test (vllm-project#13)

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Preemption Bugfixes (vllm-project#15)

* stash fixed double free issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* fixed issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated (vllm-project#16)

Signed-off-by: [email protected] <[email protected]>

* Fix Bad Merge | Fix Memory Leak in Upstream (vllm-project#18)

* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* clean up justfile, examples

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* More cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* more cleanup, precommit fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* More cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* run_accuracy_test.sh UX

Signed-off-by: Tyler Michael Smith <[email protected]>

* squash warnings

Signed-off-by: Tyler Michael Smith <[email protected]>

* pre-commit

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Add get_finished to base kv connector

Signed-off-by: mgoin <[email protected]>

* revert test.txt

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* review comments

Signed-off-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Signed-off-by: mgoin <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: mgoin <[email protected]>
Co-authored-by: mgoin <[email protected]>
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request May 6, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <[email protected]>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <[email protected]>

* [add] extra information about evns

Signed-off-by: ApostaC <[email protected]>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Updates

Signed-off-by: Tyler Michael Smith <[email protected]>

* Rs branch (vllm-project#3)

* updated

Signed-off-by: [email protected] <[email protected]>

* Rs branch (vllm-project#5)

Signed-off-by: [email protected] <[email protected]>

* Remove Unneeded Arguments (vllm-project#7)

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* cleanup

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Improve disagg-example.sh (vllm-project#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* added connector

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update

Signed-off-by: [email protected] <[email protected]>

* remove

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* seems to load properly

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* added

Signed-off-by: [email protected] <[email protected]>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* updaed

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated on scheduler side

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Hacking away

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Robert Shaw <[email protected]>

* ensure request removed from running list

Signed-off-by: Robert Shaw <[email protected]>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* rename files

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <[email protected]>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <[email protected]>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* justfile edits

Signed-off-by: Tyler Michael Smith <[email protected]>

* Update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <[email protected]>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <[email protected]>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated (vllm-project#12)

Signed-off-by: [email protected] <[email protected]>

* Add Accuracy Test (vllm-project#13)

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Preemption Bugfixes (vllm-project#15)

* stash fixed double free issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* fixed issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated (vllm-project#16)

Signed-off-by: [email protected] <[email protected]>

* Fix Bad Merge | Fix Memory Leak in Upstream (vllm-project#18)

* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatted

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* revert

Signed-off-by: [email protected] <[email protected]>

* more spurious changes

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <[email protected]>

* Update vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

Co-authored-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request May 6, 2025
* [Update] LMcache connector v1 implementation

Signed-off-by: ApostaC <[email protected]>

* [Add] examples for disaggregated prefill

Signed-off-by: ApostaC <[email protected]>

* [add] extra information about evns

Signed-off-by: ApostaC <[email protected]>

* Initial stubs for P/D scheduling changes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Updates

Signed-off-by: Tyler Michael Smith <[email protected]>

* Rs branch (vllm-project#3)

* updated

Signed-off-by: [email protected] <[email protected]>

* Rs branch (vllm-project#5)

Signed-off-by: [email protected] <[email protected]>

* Remove Unneeded Arguments (vllm-project#7)

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* cleanup

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Improve disagg-example.sh (vllm-project#8)

- fix spelling
- CUDA_VISIBLE_DEVICES should be set externally

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* added connector

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* update

Signed-off-by: [email protected] <[email protected]>

* remove

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* seems to load properly

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Revert "updated"

This reverts commit 97316d9.

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* added

Signed-off-by: [email protected] <[email protected]>

* diffs for local dev on macos

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* updaed

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Checkpoint.

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Cleanup

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated on scheduler side

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* Hacking away

Signed-off-by: Tyler Michael Smith <[email protected]>

* cleanup

Signed-off-by: Robert Shaw <[email protected]>

* ensure request removed from running list

Signed-off-by: Robert Shaw <[email protected]>

* Runs E2E. Garbage output. Crashes on 2nd request

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* rename files

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* updated

Signed-off-by: Robert Shaw <[email protected]>

* update

Signed-off-by: Robert Shaw <[email protected]>

* Second request no longer crashes

Signed-off-by: Tyler Michael Smith <[email protected]>

* Remove gpu_model_runner hacks

Signed-off-by: Tyler Michael Smith <[email protected]>

* Clean up Justfile

Signed-off-by: Tyler Michael Smith <[email protected]>

* [Bugfix] Stale finished requests in EMPTY_MODEL_RUNNER_OUTPUT

Signed-off-by: Tyler Michael Smith <[email protected]>

* update

Signed-off-by: Tyler Michael Smith <[email protected]>

* justfile edits

Signed-off-by: Tyler Michael Smith <[email protected]>

* Update

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes - lm_eval gsm8k has correctness

Signed-off-by: Tyler Michael Smith <[email protected]>

* "just delete the assert"

Signed-off-by: Tyler Michael Smith <[email protected]>

* fixup precommit issues

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

* updated (vllm-project#12)

Signed-off-by: [email protected] <[email protected]>

* Add Accuracy Test (vllm-project#13)

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* Preemption Bugfixes (vllm-project#15)

* stash fixed double free issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* fixed issue

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

* updatrd

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated (vllm-project#16)

Signed-off-by: [email protected] <[email protected]>

* Fix Bad Merge | Fix Memory Leak in Upstream (vllm-project#18)

* updated

Signed-off-by: [email protected] <[email protected]>

* fix merge

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

---------

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* cleanup code

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* stash

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updatted

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* revert

Signed-off-by: [email protected] <[email protected]>

* more spurious changes

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* updated

Signed-off-by: [email protected] <[email protected]>

* Support MLA in NIXL connector

Signed-off-by: Tyler Michael Smith <[email protected]>

* WIP adding tests

Signed-off-by: Tyler Michael Smith <[email protected]>

* wip

Signed-off-by: Tyler Michael Smith <[email protected]>

* Fixes

Signed-off-by: Tyler Michael Smith <[email protected]>

---------

Signed-off-by: ApostaC <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: [email protected] <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: ApostaC <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
AllenHaoHuang added a commit to AllenHaoHuang/vllm that referenced this pull request May 7, 2025
* Update swissai.py

Replaced LlamaConfig with SwissAIConfig
Changed up_proj from RowParallelLinear to ColumnParallelLinear

* `ColumnParallelLinear` import

* `LLaMa` -> `SwissAI`

---------

Co-authored-by: EduardDurech <[email protected]>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Aug 14, 2025
Bounty-hunter pushed a commit to Bounty-hunter/vllm that referenced this pull request Sep 23, 2025
yuz207 referenced this pull request in IluvatarLabs/vllm Sep 30, 2025
ROOT CAUSE: draft_q_soft_temp=0.50 was SHARPENING the distribution
instead of softening it (dividing by tau<1.0 doubles logit magnitudes).
This caused nucleus to collapse to 1-2 survivors → q≈1.0 → acceptance
stuck at ~0.7038 (average p_target).

FIXES:

1. Config defaults (config.py, arg_utils.py):
   - draft_q_temp_offset: 0.15 → 0.25 (better dynamic range)
   - draft_q_soft_temp: 0.50 → 2.0 (SOFTENS instead of sharpens)

   At draft_temp=0.05:
   - Before: tau_q = max(0.05+0.15, 0.50) = 0.50 (2x sharper!)
   - After:  tau_q = max(0.05+0.25, 2.0)  = 2.0  (2x softer)

2. Force min_keep=2 in nucleus (eagle.py line 271):
   - Added keep_sorted[..., :2] = True
   - Prevents survivors=1 by construction (defensive programming)

3. Fix smoothing to uniform over kept set (eagle.py lines 275-287):
   - Before: Mixed with untempered baseline (wrong approach)
   - After:  Uniform distribution over survivors only (correct)
   - Prevents q from reaching exactly 1.0 in corner cases

4. Remove dead code (eagle.py line 322):
   - Deleted unused self._current_sampling_metadata assignment
   - No longer needed with draft-anchored approach (bug #2 fix)

Expected results:
- tau_q ≥ 2.0 at ultracold temps → softer distribution
- NUC_DEBUG: survivors = hundreds/thousands (not 1-2)
- Q_DEBUG: q ∈ [0.5, 0.8] (not 0.98-1.0)
- Accept rate: dynamic range restored across temp sweep
zhangsicheng5 pushed a commit to zhangsicheng5/vllm that referenced this pull request Oct 15, 2025
BloodAxe pushed a commit to BloodAxe/vllm that referenced this pull request Oct 17, 2025
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Oct 20, 2025
Enhanced documentation for plugin patches:

1. Patch vllm-project#1 (Usage Tracking Helper):
   - Clarified as OPTIONAL (has fallback in harmony streaming patch)
   - Changed from "REQUIRED" to "OPTIONAL"
   - Explained fallback mechanism in patched_stream_method.py
   - Marked as upstreamable (minor utility addition)

2. Patch vllm-project#3 (Harmony Token-by-Token Streaming):
   - Added detailed speculative decoding context
   - Explained Eagle draft model generates 5-10 tokens per step
   - Documented specific failures with batch processing:
     * Tool calling broken
     * Multi-channel content lost
     * Token truncation during channel transitions
   - Added before/after code examples
   - Linked to PR vllm-project#26291 (Eagle3 Multi-Channel Streaming Fix)
   - Documented upstream status and removal plan

Key insight: This patch exists because Eagle speculative decoding
returns multiple tokens per step, and upstream's batch processing
can't handle per-token channel switching.

Signed-off-by: Pradyun Ramadorai <[email protected]>
IwakuraRein pushed a commit to IwakuraRein/vllm that referenced this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant