Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Nov 17, 2025

Purpose

Fix an issue caused by #28665 because the CLI still uses the defaults from SchedulerConfig. Sorry for breaking this!

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 17, 2025
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) November 17, 2025 17:44
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a bug where CLI arguments for dynamic SchedulerConfig fields (max_num_batched_tokens, max_num_seqs, enable_chunked_prefill) were using static defaults from SchedulerConfig, preventing the intended dynamic default logic from executing. By overriding the default to None for these CLI arguments, the engine can now correctly apply dynamic defaults based on the usage context and hardware. The changes are clear and address the issue effectively. I've also pointed out a similar issue with async_scheduling that could be addressed for consistency.

Comment on lines +1049 to +1060
"--max-num-batched-tokens",
**{
**scheduler_kwargs["max_num_batched_tokens"],
"default": None,
},
)
scheduler_group.add_argument(
"--max-num-seqs", **scheduler_kwargs["max_num_seqs"]
"--max-num-seqs",
**{
**scheduler_kwargs["max_num_seqs"],
"default": None,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

While you're fixing the CLI defaults for dynamic SchedulerConfig fields, it seems like async_scheduling might be another case that needs a similar change.

Currently, EngineArgs.async_scheduling defaults to False (from SchedulerConfig.async_scheduling), and the CLI argument also defaults to False. This means the logic in VllmConfig.__post_init__ to dynamically enable async_scheduling (where self.scheduler_config.async_scheduling is None) will never be triggered.

To enable the dynamic default behavior for async_scheduling, you might need to:

  1. Change the default value of async_scheduling in EngineArgs to None.
  2. Update its add_argument call in add_cli_args to set default=None, similar to the other fields in this PR.

This would make its behavior consistent with the other dynamically configured scheduler arguments.

@mgoin mgoin added the bug Something isn't working label Nov 17, 2025
@ywang96
Copy link
Member

ywang96 commented Nov 17, 2025

Looking at the failing test - it seems that max_num_batched_tokens is still using the same value as max_model_len?

[2025-11-17T18:40:57Z] INFO 11-17 10:40:57 [model.py:1745] Using max model len 448
...
[2025-11-17T18:41:25Z] (EngineCore_DP0 pid=5910) ValueError: Chunked MM input disabled but max_tokens_per_mm_item (1500) is larger than max_num_batched_tokens (448). Please increase max_num_batched_tokens.

Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337
Copy link
Member Author

DarkLight1337 commented Nov 18, 2025

It is caused by this code, which seems intentional. cc @NickLucche @russellb

            # When using default settings,
            # Ensure max_num_batched_tokens does not exceed model limit.
            # Some models (e.g., Whisper) have embeddings tied to max length.
            self.max_num_batched_tokens = min(
                self.max_num_seqs * model_config.max_model_len,
                self.max_num_batched_tokens,
            )

For now I have worked around it by increasing max_num_seqs

@DarkLight1337
Copy link
Member Author

The previously failing test now passes, merging as the other entrypoint test is failing on main

@vllm-bot vllm-bot merged commit bf9e1e8 into vllm-project:main Nov 18, 2025
39 of 47 checks passed
@DarkLight1337 DarkLight1337 deleted the fix-serve-defaults branch November 18, 2025 04:30
@DarkLight1337 DarkLight1337 added this to the v0.11.1 milestone Nov 18, 2025
Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants