-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Core] Reduce TTFT with concurrent partial prefills #10235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+701
−108
Merged
Changes from all commits
Commits
Show all changes
71 commits
Select commit
Hold shift + click to select a range
f97eacf
:bug: fix multi-chunked-prefill sampler bug
joerunde b50a6b8
🚧 add num_prefill_slots arg
prashantgupta24 7f23c04
:sparkles: start to write prefill slot logic
joerunde d271cc9
🎨 format
prashantgupta24 b2cb96f
:sparkles: update num tokens for prefill slots
joerunde c349ac0
♻️ add schedule_chunked_prefill logic
prashantgupta24 e20518d
♻️ change function name
prashantgupta24 6ba0e34
:sparkles: reserve incoming prefill slots
joerunde a7491cc
🎨 fix some typos
prashantgupta24 1ee6fea
:zap: finish awesome scheduler
joerunde 517915a
:bug: fix the deadlocks
joerunde ed298c3
:memo: Add more docstrings
joerunde 90e0c07
:bug: fix deadlock
joerunde 1c92ac2
:construction: WIP scheduler tests
joerunde de95f62
:bug: fix prefix caching
joerunde 41e20ca
:test_tube: add prefix caching test
joerunde 4dc7310
✅ add second test iteration
prashantgupta24 8e3118e
✅ add llm engine test
prashantgupta24 b6ebec8
♻️ quicker budget check
prashantgupta24 7e93668
🎨 rename to max_num_partial_prefills
prashantgupta24 557bfe3
🎨 more renaming to max_num_partial_prefills + docstring updates
prashantgupta24 d3e94df
🎨 rename big to long
prashantgupta24 849baf6
♻️ add cli args for partial_prefill configs
prashantgupta24 beaf086
🎨 fix request word typo
prashantgupta24 672a50c
🎨 more docstring changes
prashantgupta24 a2751ff
🎨 forgot to add the new args to config
prashantgupta24 dff757d
🐛 fix range bug on partial_prefill_budget_lookup_list
prashantgupta24 86ffa04
🎨 add docstring to test function
prashantgupta24 3d39942
:construction: WIP move metadata to dataclass
joerunde dbb9ae8
🎨 wrap up PartialPrefillMetadata
prashantgupta24 4bac8ed
♻️ add some utility functions within partial_prefill_metadata
prashantgupta24 c44ca1f
🎨 change to long_prefill_token_threshold
prashantgupta24 38bad7a
🔥 remove commented code
prashantgupta24 0f3efa1
🐛 fix the big bug! (Thanks Joe)
prashantgupta24 3daf35f
:memo: docstings galore
joerunde 241853a
🎨 fix typo
prashantgupta24 07b6d72
⏪ revert logging change
prashantgupta24 c4bdf37
✅ remove value error from test
prashantgupta24 7c8b400
✅ remove value error from test
prashantgupta24 21796fc
🎨 fix typo
prashantgupta24 d993861
✅ make test comprehensive
prashantgupta24 946d297
🎨 fix unused vars in test
prashantgupta24 5535515
🎨 some more comments
prashantgupta24 ba91ddf
🎨 fix merge conflict
prashantgupta24 bccf86f
🎨 fmt
prashantgupta24 75848c9
♻️ merge with main
prashantgupta24 1c80379
Merge branch 'main' into prefill-slots
prashantgupta24 4f1c322
🎨 fix fmt
prashantgupta24 cb8fc93
⏪ revert quick budget check
prashantgupta24 8a8a07f
🎨 fmt
prashantgupta24 90a53ab
♻️ merge with main
prashantgupta24 29a7ccd
Merge remote-tracking branch 'upstream/main' into prefill-slots
prashantgupta24 752ce1b
🎨 fmt
prashantgupta24 edc204e
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde 0206173
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde 80b72ef
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde 03525f2
:bug: fix index out of range
joerunde d5f5eb6
:recycle: naming updates
joerunde cb5361a
:bug: fix long prefill threshold init
joerunde 8a44c89
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde 6de9b56
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde c1ef186
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde 5977716
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde 5fb2196
:art: fmt
joerunde 5840994
Merge remote-tracking branch 'upstream/main' into prefill-slots
joerunde ef25a0d
:wrench: update config and tests
joerunde 1d890af
:rewind: revert style change
joerunde cf33c63
:recycle: cannot -> can
joerunde edb461e
Update vllm/core/scheduler.py
joerunde 686f035
:art: fmt
joerunde dad07a8
:bug: fixup renamed fn ref
joerunde File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @rickyyx here's what we tried to do to test that the sampler doesn't throw any assertions- we put multiple prompts into an engine and manually step it forward with them all partially prefilled