Skip to content

Conversation

@noooop
Copy link
Collaborator

@noooop noooop commented Oct 11, 2025

Improve all pooling task

These PRs are mostly conflicting with each other, so combining them into a series would better inform reviewers about what happened. And what else needs to be done after that?

Purpose

Improve enable chunked_prefill & prefix_caching logic.

Address:

Test Plan

pytest -s -vvv tests/test_config.py::test_is_chunked_prefill_supported
pytest -s -vvv tests/test_config.py::test_is_prefix_caching_supported
pytest -s -vvv tests/models/language/pooling/test_auto_prefix_cache_support.py

Test Result

pass


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@noooop noooop changed the title [Model][0/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. [Model][4/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. Oct 11, 2025
@noooop noooop changed the title [Model][4/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. [Model][5/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. Oct 14, 2025
@noooop noooop changed the title [Model][5/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. [Model][6/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. Oct 17, 2025
@noooop noooop changed the title [Model][6/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. [Model][7/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. Nov 11, 2025
Signed-off-by: wang.yuqi <[email protected]>
@noooop noooop force-pushed the chunked_prefill_logic branch from c8e46f4 to aa55dab Compare November 18, 2025 03:37
@noooop noooop changed the title [Model][7/N] Improve all pooling task | Improve enable chunked_prefill & prefix_caching logic. [Model] Improve enable chunked_prefill & prefix_caching logic. Nov 20, 2025
Signed-off-by: wang.yuqi <[email protected]>
@noooop noooop marked this pull request as ready for review November 20, 2025 10:41
@noooop noooop requested a review from WoosukKwon as a code owner November 20, 2025 10:41
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new class for bool? I think these kind of classes add mental overhead for no good reason. Will leave more comments tmrw.

@noooop
Copy link
Collaborator Author

noooop commented Nov 21, 2025

Why do we need a new class for bool? I think these kind of classes add mental overhead for no good reason. Will leave more comments tmrw.

BoolWithReason is a great method that combines the reasons why a feature can't be enabled, making the code cleaner, improving readability and testability. Perhaps the name BoolWithReason isn't perfect, but we need similar functionality.

(This part of the code is a complete nightmare, and it keeps biting us. Let’s refactor it so we can all have a life again!)

Copy link
Member

@hmellor hmellor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, all the reasons should just be warnings

@mergify mergify bot added the v1 label Nov 22, 2025
Comment on lines 253 to 260
default = (
default.default
if default.default_factory is None
# FIXME: A default vllm_config will be constructed here.
# Info logs will be output, which is confusing.
else default.default_factory()
)
elif field.default_factory is not MISSING:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DarkLight1337 @hmellor

A default vllm_config will be constructed here, including the creation of a default SchedulerConfig, and the following info log will be output, which is confusing.

For example, BAAI/bge-base-en does not support chunked prefill, so chunked prefill is not used by default.

vllm serve BAAI/bge-base-en

INFO 11-22 15:35:41 [config/scheduler.py:207] Chunked prefill is enabled with max_num_batched_tokens=2048.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-model Requests to new models v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants