[Config] Clean up SchedulerConfig initialization #28665

DarkLight1337 · 2025-11-13T16:05:50Z

Purpose

Move default values of SchedulerConfig from utils to SchedulerConfig itself.
In normal usage of vLLM, max_num_batched_tokens and max_num_seqs are actually set by UsageContext when constructing EngineArgs, so there is no point in making the default values of SchedulerConfig dynamic (the defaults for pooling and multimodal models are actually being overwritten). This simplifies the initialization code a lot.
Split up the logic of EngineArgs._set_default_args to be more modular.
Make SchedulerConfig.chunked_prefill_enabled a property-based alias of SchedulerConfig.enable_chunked_prefill to avoid having to set both when overriding the config in each platform.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: DarkLight1337 <[email protected]>

gemini-code-assist

Code Review

This pull request provides a solid cleanup of the SchedulerConfig initialization. Moving the default value logic from SchedulerConfig.__post_init__ to EngineArgs and refactoring it into new helper methods significantly improves modularity and readability. The use of class variables for default values in SchedulerConfig is also a good practice. I've identified one minor logging issue that should be addressed.

vllm/engine/arg_utils.py

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 · 2025-11-13T16:09:12Z

/gemini review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/engine/arg_utils.py

vllm/config/scheduler.py

gemini-code-assist

Code Review

This pull request is a good refactoring that cleans up SchedulerConfig initialization by centralizing default values and improving the modularity of EngineArgs. The use of a property for chunked_prefill_enabled is a clean way to handle aliasing and maintain backward compatibility.

However, I've identified a critical issue: the refactoring appears to have removed the specific default max_num_batched_tokens logic for pooling and multimodal models. This could lead to a performance regression for these model types. I've added a detailed comment regarding this. Once this is addressed, the PR will be a strong improvement.

gemini-code-assist · 2025-11-13T16:11:04Z

vllm/config/scheduler.py

    def __post_init__(self, is_encoder_decoder: bool) -> None:
-        if self.max_model_len is None:
-            self.max_model_len = 8192
-
-        if self.max_num_seqs is None:
-            self.max_num_seqs = 128
-
        if is_encoder_decoder:
            # Chunked prefill should be disabled for encoder-decoder models.
            self.disable_chunked_mm_input = True
-            self.chunked_prefill_enabled = False
            self.enable_chunked_prefill = False
            self.long_prefill_token_threshold = 0
            logger.info(
                "Encoder-decoder models do not support chunked prefill nor"
                " prefix caching; disabling both."
            )

-        if self.max_num_batched_tokens is None:
-            if self.enable_chunked_prefill:
-                self.max_num_batched_tokens = DEFAULT_MAX_NUM_BATCHED_TOKENS
-            else:
-                # If max_model_len is too short, use
-                # DEFAULT_MAX_NUM_BATCHED_TOKENS as the default value
-                # for higher throughput.
-                self.max_num_batched_tokens = max(
-                    self.max_model_len, DEFAULT_MAX_NUM_BATCHED_TOKENS
-                )
-
-            if self.runner_type == "pooling":
-                # Choose specific value for higher throughput
-                self.max_num_batched_tokens = max(
-                    self.max_num_batched_tokens,
-                    POOLING_MODEL_MAX_NUM_BATCHED_TOKENS,
-                )
-            if self.is_multimodal_model:
-                # The value needs to be at least the number of multimodal tokens
-                self.max_num_batched_tokens = max(
-                    self.max_num_batched_tokens,
-                    MULTIMODAL_MODEL_MAX_NUM_BATCHED_TOKENS,
-                )
-
-            # When using default settings,
-            # Ensure max_num_batched_tokens does not exceed model limit.
-            # Some models (e.g., Whisper) have embeddings tied to max length.
-            self.max_num_batched_tokens = min(
-                self.max_num_seqs * self.max_model_len, self.max_num_batched_tokens
+        if not self.enable_chunked_prefill:
+            # If max_model_len is too short, use the default for higher throughput.
+            self.max_num_batched_tokens = max(
+                self.max_model_len,
+                self.max_num_batched_tokens,
            )

+        # Ensure max_num_batched_tokens does not exceed model limit.
+        # Some models (e.g., Whisper) have embeddings tied to max length.
+        self.max_num_batched_tokens = min(
+            self.max_num_seqs * self.max_model_len,
+            self.max_num_batched_tokens,
+        )
+
        self.max_num_encoder_input_tokens = self.max_num_batched_tokens
        self.encoder_cache_size = self.max_num_batched_tokens



This refactoring simplifies the initialization, but it seems to have removed the special default logic for max_num_batched_tokens for pooling and multimodal models.

Previously, if max_num_batched_tokens was not set by the user or a UsageContext-specific default, there was fallback logic to increase it for pooling models (to 32768) and multimodal models (to 5120) for better throughput. This logic was triggered if max_num_batched_tokens was None when __post_init__ was called.

This logic has now been removed. The justification in the PR description suggests this was dead code, but it appears it would have been triggered if no UsageContext default was found. The new implementation in EngineArgs ensures max_num_batched_tokens is always set, but the specific, higher defaults for pooling/multimodal models are no longer applied anywhere.

Removing this could lead to a significant performance regression for these model types. Could you please confirm if this change is intended? If it's a mistake, this logic should be restored, perhaps within EngineArgs.get_batch_defaults.

The case of no UsageContext is not normal usage of vLLM

@njhill @WoosukKwon @robertgshaw2-redhat correct me if I'm wrong about this

Signed-off-by: DarkLight1337 <[email protected]>

hmellor

This is a really nice change, I've left a few comments for clarification

hmellor · 2025-11-14T12:16:49Z

vllm/config/scheduler.py

+    max_model_len: int = Field(default=8192, ge=1)
+    """Maximum length of a sequence (including prompt and generated text).
+
+    The default value here is mainly for convenience when testing.
+    In real usage, this should duplicate `ModelConfig.max_model_len` via
+    `EngineArgs`."""


Could we remove this entirely?

It is used in some other places like vllm.v1.core.sched.Scheduler. We can try to refactor this in another PR.

Yeah this would need a small refactor. A follow up PR sounds good.

hmellor · 2025-11-14T12:20:57Z

vllm/config/scheduler.py

+    @property
+    def chunked_prefill_enabled(self) -> bool:
+        return self.enable_chunked_prefill
+
+    @chunked_prefill_enabled.setter
+    def chunked_prefill_enabled(self, value: bool):
+        self.enable_chunked_prefill = value
+


Can we just remove this? It used to be init=False so it's not part of the normal API of SchedulerConfig

Same as above

hmellor · 2025-11-14T12:24:45Z

vllm/engine/arg_utils.py

    gpu_memory_utilization: float = CacheConfig.gpu_memory_utilization
    kv_cache_memory_bytes: int | None = CacheConfig.kv_cache_memory_bytes
-    max_num_batched_tokens: int | None = SchedulerConfig.max_num_batched_tokens
+    max_num_batched_tokens: int | None = None


Could you group these fields which don't copy the defaults from their respective config and add a comment saying why?

We can do that in the next cleanup

tests/v1/sample/test_logprobs.py

tests/models/language/generation/test_hybrid.py

Signed-off-by: DarkLight1337 <[email protected]>

WoosukKwon · 2025-11-17T07:07:33Z

@DarkLight1337 I think this PR changes the default values for max_num_batched_tokens and max_num_seqs unexpectedly.

WoosukKwon · 2025-11-17T07:11:35Z

IIUC, the two if statements here are not executed because self.max_num_batched_tokens and self.max_num_seqs are already set to the default values in SchedulerConfig:

vllm/vllm/engine/arg_utils.py

Lines 1988 to 1998 in 3380ed5

    
           if self.max_num_batched_tokens is None: 
        
               self.max_num_batched_tokens = default_max_num_batched_tokens.get( 
        
                   usage_context, 
        
                   SchedulerConfig.DEFAULT_MAX_NUM_BATCHED_TOKENS, 
        
               ) 
        
           if self.max_num_seqs is None: 
        
               self.max_num_seqs = default_max_num_seqs.get( 
        
                   usage_context, 
        
                   SchedulerConfig.DEFAULT_MAX_NUM_SEQS, 
        
               )

hmellor · 2025-11-17T10:48:25Z

If the behaviour has changed I don't think it's because of the changed defaults in SchedulerConfig. The if blocks linked will be triggered if the values of EngineArgs are None, which still are in the default case:

vllm/vllm/engine/arg_utils.py

Line 436 in d4acf51

max_num_batched_tokens: int | None = None

vllm/vllm/engine/arg_utils.py

Line 440 in d4acf51

max_num_seqs: int | None = None

DarkLight1337 · 2025-11-17T13:19:57Z

Yeah, tests/v1/engine/test_engine_args.py::test_defaults_with_usage_context ensures that the defaults stay the same.

WoosukKwon · 2025-11-17T17:12:07Z

@hmellor @DarkLight1337 In B200, the correct default values (1024 seqs, 8K tokens) are not used. vllm serve uses 128 seqs & 2K tokens instead, which limits the performance a lot.

DarkLight1337 · 2025-11-17T17:16:12Z

Can you run tests/v1/engine/test_engine_args.py::test_defaults_with_usage_context on the B200 and see if the test passes?

WoosukKwon · 2025-11-17T17:23:13Z

@DarkLight1337 It passes the test, but it's probably because the usage context is considered in the test (while it doesn't in vllm serve). And I've confirmed that this PR caused the issue.

DarkLight1337 · 2025-11-17T17:24:37Z

Hmm, shouldn't UsageContext.OPENAI_API_SERVER be used in vllm serve? Or did I misunderstand how UsageContext works?

WoosukKwon · 2025-11-17T17:26:22Z

@DarkLight1337 vllm serve supposed to use usage_context too. However, this PR introduces a bug that it doesn't use the usage context.

 if self.max_num_batched_tokens is None: 
     self.max_num_batched_tokens = default_max_num_batched_tokens.get( 
         usage_context, 
         SchedulerConfig.DEFAULT_MAX_NUM_BATCHED_TOKENS, 
     )

Here, self.max_num_batched_tokens is never None after this PR for some reason. It is somehow set to 2048 (default value in SchedulerConfig).

DarkLight1337 · 2025-11-17T17:35:54Z

Ok I figured out the issue, the CLI defaults are still using the ones from SchedulerConfig.

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

[Config] Clean up SchedulerConfig initialization

08c2978

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from Isotr0py, jikunshang, noooop and ywang96 November 13, 2025 16:05

DarkLight1337 requested a review from NickLucche as a code owner November 13, 2025 16:05

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 13, 2025

DarkLight1337 requested review from ProExpertProg, WoosukKwon, bigPYJ1151, hmellor, houseroad, mgoin, robertgshaw2-redhat, simon-mo, tlrmchlsmth, yewentao256 and youkaichao as code owners November 13, 2025 16:05

mergify bot added the tpu Related to Google TPUs label Nov 13, 2025

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

Fix

ab2a1d7

Signed-off-by: DarkLight1337 <[email protected]>

chatgpt-codex-connector bot reviewed Nov 13, 2025

View reviewed changes

vllm/engine/arg_utils.py Show resolved Hide resolved

vllm/config/scheduler.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

Isotr0py approved these changes Nov 13, 2025

View reviewed changes

DarkLight1337 added 5 commits November 14, 2025 06:29

Fix wrong defaults

1587a60

Signed-off-by: DarkLight1337 <[email protected]>

Merge branch 'main' into clean-sched-defaults

380b616

Remove unused typevar

4a46550

Signed-off-by: DarkLight1337 <[email protected]>

Fix failing tests

8af29d6

Signed-off-by: DarkLight1337 <[email protected]>

Update logs

bad9872

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 added 2 commits November 14, 2025 09:26

Add warning

640ca69

Signed-off-by: DarkLight1337 <[email protected]>

Fix tests

78b7f2a

Signed-off-by: DarkLight1337 <[email protected]>

DarkLight1337 requested review from ApostaC, alexm-redhat, heheda12345, njhill and tdoublep as code owners November 14, 2025 11:54

mergify bot added the v1 label Nov 14, 2025

hmellor reviewed Nov 14, 2025

View reviewed changes

hmellor approved these changes Nov 14, 2025

View reviewed changes

DarkLight1337 merged commit 511a6b6 into vllm-project:main Nov 14, 2025
50 checks passed

DarkLight1337 deleted the clean-sched-defaults branch November 14, 2025 14:41

This was referenced Nov 14, 2025

[Misc] Make SchedulerConfig.max_model_len init-only #28733

Merged

[Chore] Rename SchedulerConfig.chunked_prefill_enabled #28735

Merged

ZJY0516 mentioned this pull request Nov 15, 2025

[Bugfix] fix kimi-linear crash #28445

Merged

5 tasks

geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025

[Config] Clean up SchedulerConfig initialization (vllm-project#28665)

5837ac8

Signed-off-by: DarkLight1337 <[email protected]>

22quinn mentioned this pull request Nov 17, 2025

Do not allow disabling chunked prefill for generation models #28833

Open

5 tasks

DarkLight1337 mentioned this pull request Nov 17, 2025

[Bugfix] Fix wrong CLI defaults for dynamic SchedulerConfig fields #28872

Merged

5 tasks

bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025

[Config] Clean up SchedulerConfig initialization (vllm-project#28665)

48f4f4f

Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Bram Wasti <[email protected]>

noooop mentioned this pull request Nov 20, 2025

[Model] Improve enable chunked_prefill & prefix_caching logic. #26623

Open

5 tasks

Uh oh!

[Config] Clean up SchedulerConfig initialization #28665

[Config] Clean up SchedulerConfig initialization #28665

Uh oh!

Conversation

DarkLight1337 commented Nov 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Nov 13, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hmellor left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

hmellor commented Nov 17, 2025

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

WoosukKwon commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

DarkLight1337 commented Nov 13, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Nov 13, 2025 •

edited

Loading

WoosukKwon commented Nov 17, 2025 •

edited

Loading