[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list #20952

eicherseiji · 2025-07-14T23:41:42Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Problem: The current implementation forces users to choose between default loggers or custom loggers, but not both. It's possible to recreate the default logger list (and append a custom logger), but this creates a manual maintenance effort if the default loggers change.

Based on discussion in #14661, I see this was deprioritized to start, but thought it worth raising again as a minimally invasive ergonomics change. Would love to get thoughts on this. Thanks!

Implements feature request #17702.

Ref:

# vllm/vllm/v1/metrics/loggers.py
def setup_default_loggers(
    vllm_config: VllmConfig,
    log_stats: bool,
    engine_num: int,
    custom_stat_loggers: Optional[list[StatLoggerFactory]] = None,
) -> list[list[StatLoggerBase]]:
    """Setup logging and prometheus metrics."""
    if not log_stats:
        return []

    factories: list[StatLoggerFactory]
    if custom_stat_loggers is not None:
        factories = custom_stat_loggers
    else:
        factories = [PrometheusStatLogger]
        if logger.isEnabledFor(logging.INFO):
            factories.append(LoggingStatLogger)

Test Plan

pytest -vs v1/metrics/test_engine_logger_apis.py

Test Result

(base) ray@ip-10-0-251-217:~/default/work/vllm/tests$ pytest -vs v1/metrics/test_engine_logger_apis.py
INFO 07-15 17:16:59 [__init__.py:253] Automatically detected platform cuda.
=================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.11, pytest-8.4.1, pluggy-1.5.0 -- /home/ray/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/ray/default/work/vllm
configfile: pyproject.toml
plugins: asyncio-1.0.0, anyio-3.7.1
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                                                                          

v1/metrics/test_engine_logger_apis.py::test_async_llm_add_logger INFO 07-15 17:17:07 [config.py:3485] Downcasting torch.float32 to torch.float16.
INFO 07-15 17:17:07 [config.py:1561] Using max model len 1024
INFO 07-15 17:17:08 [config.py:2380] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 07-15 17:17:08 [cuda.py:103] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 07-15 17:17:08 [core.py:526] Waiting for init message from front-end.
INFO 07-15 17:17:08 [core.py:69] Initializing a V1 LLM engine (v0.1.dev7731+g480beba) with config: model='distilbert/distilgpt2', speculative_config=None, tokenizer='distilbert/distilgpt2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=distilbert/distilgpt2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
INFO 07-15 17:17:11 [parallel_state.py:1090] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 07-15 17:17:11 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 07-15 17:17:11 [gpu_model_runner.py:1742] Starting to load model distilbert/distilgpt2...
INFO 07-15 17:17:11 [gpu_model_runner.py:1747] Loading model from scratch...
INFO 07-15 17:17:11 [cuda.py:290] Using Flash Attention backend on V1 engine.
INFO 07-15 17:17:11 [weight_utils.py:296] Using model weights format ['*.safetensors']
INFO 07-15 17:17:11 [weight_utils.py:349] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 12.30it/s]

INFO 07-15 17:17:11 [default_loader.py:272] Loading weights took 0.09 seconds
INFO 07-15 17:17:12 [gpu_model_runner.py:1773] Model loading took 0.1547 GiB and 0.619774 seconds
INFO 07-15 17:17:13 [gpu_worker.py:244] Available KV cache memory: 19.34 GiB
INFO 07-15 17:17:13 [kv_cache_utils.py:728] GPU KV cache size: 1,126,912 tokens
INFO 07-15 17:17:13 [kv_cache_utils.py:732] Maximum concurrency for 1,024 tokens per request: 1100.50x
INFO 07-15 17:17:13 [core.py:172] init engine (profile, create kv cache, warmup model) took 1.24 seconds
PASSED

=================================================================================================== 1 passed in 12.93s ====================================================================================================

(Optional) Documentation Update

Signed-off-by: Seiji Eicher <[email protected]>

github-actions · 2025-07-14T23:41:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how statistical loggers are integrated into the AsyncLLM engine by introducing a dedicated asynchronous API for adding them. This change provides more flexibility in managing loggers and includes built-in validation to prevent common misconfigurations.

Highlights

New API for Logger Management: Introduced a new asynchronous method, add_logger, to the AsyncLLM class. This method allows for dynamically adding statistical loggers to the engine after its initialization.
Refactored Logger Initialization: The way statistical loggers are added to AsyncLLM has been changed. Instead of passing stat_loggers directly to the AsyncLLM.from_engine_args constructor, loggers are now added via the new add_logger method post-instantiation.
Logger Validation and Duplication Prevention: The add_logger method includes checks to ensure that stat logging is enabled before adding a logger. It also prevents the addition of multiple loggers of the same type, raising a KeyError if a duplicate is detected.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new add_logger API to AsyncLLM, allowing for dynamic addition of stat loggers after engine initialization. The change is well-contained and includes a corresponding update to the tests to use the new API.

I've found a critical issue in the implementation of the duplicate logger check within the new add_logger method, which I've detailed in a specific comment. Addressing this will ensure the new feature works as intended.

vllm/v1/engine/async_llm.py

Signed-off-by: Seiji Eicher <[email protected]>

robertgshaw2-redhat · 2025-07-16T00:25:34Z

Why do we want this? I think we should be careful about adding a new API like this before 1.0

eicherseiji · 2025-07-16T00:36:48Z

Why do we want this? I think we should be careful about adding a new API like this before 1.0

Hi @robertgshaw2-redhat! Thanks for taking a look. I updated the PR body with a more detailed problem statement. Basically my perspective is that it should be possible to add additional loggers without implicitly removing default ones.

Totally understand if this is not a safe change to make just yet, though. Are there docs/discussion somewhere I can reference for more background on our unstable/alpha API designs pre-1.0? Or maybe just extra context why this is too early?

mergify · 2025-07-21T16:16:35Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @eicherseiji.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

ruisearch42 · 2025-07-21T17:43:02Z

@eicherseiji this needs to be revisited considering #21257 ?

njhill · 2025-07-21T17:50:16Z

it should be possible to add additional loggers without implicitly removing default ones.

Yes I think the current interface isn't ideal in this regard. Either you default to built-in ones or provide completely separate list.

IMO we should address this via the constructor args though instead of a new method. Perhaps even a dedicated flag to toggle whether the default statsloggers should be included.

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji · 2025-07-22T21:17:41Z

Thanks all for the feedback so far.

Indeed @ruisearch42, #21257 changed things slightly:

PrometheusStatLogger is no longer disabled when passing a list of custom stat loggers (unless a subclass is found in the list of custom loggers)
As a result, only LoggingStatLogger will still be disabled if a list of custom stat loggers are provided

@njhill I think the suggestion makes sense, the drawback is that we pollute the AsyncLLM constructor somewhat by adding another parameter (that's only conditionally applicable when custom stat loggers are used). To avoid adding to AsyncLLM.__init__, what if a new field was added to VllmConfig, e.g. LoggingConfig?

Otherwise, I see add_logger as less invasive change that implies retaining the default loggers (while still allowing users to remove the defaults by passing stat_loggers directly). Due to #21257, we need to check if the logger should be shared in a DP deployment. I pushed a commit along these lines to invite further feedback.

ruisearch42

Thanks. Please make sure @robertgshaw2-redhat @njhill agree with the design change.

vllm/v1/metrics/loggers.py

ptovam · 2025-07-28T09:18:27Z

Hi @eicherseiji @ruisearch42 @njhill @robertgshaw2-redhat,

I’ve opened this PR to extend custom_stat_loggers support to the CLI, enabling dynamic loading of user-defined loggers. Following @simon-mo’s suggestion, I’m currently updating it to align with the vLLM plugin system.

(Initial plugin-based commit: #22456)

I’m wondering how this should integrate with the current implementation and the changes here. Since I’m planning to introduce a dedicated plugin group that dynamically loads StatLoggerBase classes, do you think we should maintain both approaches — custom_stat_loggers and add_logger — alongside the plugin system? Or would it make sense to unify or deprecate some of the existing mechanisms?

NOTE: This PR fixes the limitation of having to choose between default or custom loggers. That’s a great improvement, and the plugin system will need to align with this change as well—since add_logger() won’t map well to the plugin system, options like a LoggingConfig field on VllmConfig or a dedicated flag to toggle default loggers might be a better fit there.

Appreciate any thoughts, and am happy to adjust based on the direction you think is best.

eicherseiji · 2025-07-28T16:58:30Z

Thanks @ptovam for helping coordinate!

@njhill seems like the constructor/non-method implementation you proposed may be more compatible with the plugin system than the add_logger method I have here. My leaning is toward a new field on VllmConfig, but I'm open to different directions. Lmk thoughts if you get a chance. cc: @robertgshaw2-redhat

ptovam · 2025-07-28T18:53:37Z

Thanks! I like the idea of a new config field, as it could also make it easier to pass parameters (port, path, etc.) to custom loggers in the future — similar to how model_loader_extra_config works.

njhill · 2025-08-08T18:29:49Z

Thanks @eicherseiji @ptovam and sorry for taking so long to get back to this.

For an immediate change WDYT about the following:

Change the behavior of the existing custom_stats_loggers arg to be additive, the default stats loggers will be used regardless of whether additional custom ones are added
Change the behavior of disable_log_stats to only disable the built-in/default stats loggers, so that it can be used in conjunction with custom_stats_loggers

I think that will be minimally disruptive to existing users and address your original concern without requiring API additions.

Actually there is another open PR #22227 which addresses incompatibility of the logging stats logger with DP and in its current state also addresses (1).

Then as a follow-on we can decide how to extend this to enable plugging in custom loggers in the server case, i.e. #22456.

njhill

Thanks @eicherseiji sorry for the checking again, I'll try to turn around quicker...

vllm/v1/engine/async_llm.py

njhill · 2025-08-18T23:02:51Z

vllm/v1/metrics/loggers.py

+        self.dp_shared_loggers = []
+        if enable_default_loggers:
+            self.dp_shared_loggers.append(
+                PrometheusStatLogger(vllm_config, engine_idxs))


Any reason not to keep this as a single prometheus_logger field?

Makes sense, addressed in https://github.com/vllm-project/vllm/pull/20952/files#r2271790049

I think the above self.dp_shared_loggers code can also be removed now? It's not used right?

Yup, thanks.

Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Seiji Eicher <[email protected]>

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji

Thanks @njhill for the comments, my apologies for the delayed reply (OOO + urgent work last week). Incorporated the feedback, lmk what you think.

eicherseiji · 2025-09-02T20:33:00Z

vllm/v1/metrics/loggers.py

+        if enable_default_loggers and logger.isEnabledFor(logging.INFO):
+            factories.append(LoggingStatLogger)
+
+        # For Prometheus, need to share the metrics between EngineCores.


Point taken. I think it makes sense to keep the change minimal/narrow for now and defer a dp_shared_loggers concept until it's clearly needed; keeping the single prometheus_logger field.

eicherseiji · 2025-09-02T20:33:37Z

vllm/v1/metrics/loggers.py

+        self.dp_shared_loggers = []
+        if enable_default_loggers:
+            self.dp_shared_loggers.append(
+                PrometheusStatLogger(vllm_config, engine_idxs))


Makes sense, addressed in https://github.com/vllm-project/vllm/pull/20952/files#r2271790049

njhill

Thanks @eicherseiji, no worries about the delay. I have one remaining comment

njhill · 2025-09-03T17:27:27Z

vllm/v1/metrics/loggers.py

+        self.dp_shared_loggers = []
+        if enable_default_loggers:
+            self.dp_shared_loggers.append(
+                PrometheusStatLogger(vllm_config, engine_idxs))


I think the above self.dp_shared_loggers code can also be removed now? It's not used right?

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji

Updated tests.

eicherseiji · 2025-09-03T17:59:10Z

vllm/v1/metrics/loggers.py

+        self.dp_shared_loggers = []
+        if enable_default_loggers:
+            self.dp_shared_loggers.append(
+                PrometheusStatLogger(vllm_config, engine_idxs))


Yup, thanks.

njhill

Thanks @eicherseiji

njhill · 2025-09-03T21:39:13Z

@eicherseiji looks like there is a test failure, presumably test needs updating: https://buildkite.com/vllm/ci/builds/29369#0199110d-18c2-49ab-9820-6c9995a41de8

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji · 2025-09-04T00:53:16Z

Thanks @njhill! Fixed.

…vllm-project#20952) Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: Seiji Eicher <[email protected]> Co-authored-by: Nick Hill <[email protected]>

…vllm-project#20952) Signed-off-by: Seiji Eicher <[email protected]> Signed-off-by: Seiji Eicher <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: xuebwang-amd <[email protected]>

Add add_logger API to AsyncLLM

485bfed

Signed-off-by: Seiji Eicher <[email protected]>

gemini-code-assist bot reviewed Jul 14, 2025

View reviewed changes

mergify bot added the v1 label Jul 14, 2025

gemini-code-assist bot reviewed Jul 14, 2025

View reviewed changes

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

eicherseiji closed this Jul 15, 2025

eicherseiji reopened this Jul 15, 2025

Simplify by avoiding duplicate logger checks

ba96afc

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji marked this pull request as ready for review July 16, 2025 21:42

eicherseiji requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners July 16, 2025 21:42

mergify bot added the needs-rebase label Jul 21, 2025

eicherseiji added 2 commits July 22, 2025 10:13

Merge upstream

dda13e3

Signed-off-by: Seiji Eicher <[email protected]>

Adapt to vllm-project#21257

82a3a09

Signed-off-by: Seiji Eicher <[email protected]>

mergify bot removed the needs-rebase label Jul 22, 2025

ruisearch42 reviewed Jul 23, 2025

View reviewed changes

vllm/v1/metrics/loggers.py Outdated Show resolved Hide resolved

vllm/v1/metrics/loggers.py Outdated Show resolved Hide resolved

vllm/v1/metrics/loggers.py Outdated Show resolved Hide resolved

njhill reviewed Aug 18, 2025

View reviewed changes

eicherseiji and others added 2 commits September 2, 2025 13:22

Apply suggestions from code review

04caee2

Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Seiji Eicher <[email protected]>

Restore self.prometheus_logger

716f713

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji commented Sep 2, 2025

View reviewed changes

njhill reviewed Sep 3, 2025

View reviewed changes

Remove remaining reference to dp_shared_loggers and update tests

f4661dc

Signed-off-by: Seiji Eicher <[email protected]>

eicherseiji commented Sep 3, 2025

View reviewed changes

Merge branch 'main' into v1-add-logger

7afbbaa

njhill approved these changes Sep 3, 2025

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025

Update test_customize_loggers with new behavior

95eae5d

Signed-off-by: Seiji Eicher <[email protected]>

Merge branch 'main' into v1-add-logger

a4546b2

njhill enabled auto-merge (squash) September 4, 2025 20:08

njhill changed the title ~~AsyncLLM custom stat logger APIs should extend default logger list~~ [Misc] Have AsyncLLM custom_stat_loggers extend default logger list Sep 4, 2025

simon-mo disabled auto-merge September 4, 2025 21:25

simon-mo merged commit 60b755c into vllm-project:main Sep 4, 2025
38 of 40 checks passed

This was referenced Sep 4, 2025

[Bugfix] Disable the statslogger if the api_server_count is greater than 1 #22227

Merged

[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations #22456

Merged

Uh oh!

[Misc] Have AsyncLLM custom_stat_loggers extend default logger list #20952

[Misc] Have AsyncLLM custom_stat_loggers extend default logger list #20952

Uh oh!

Conversation

eicherseiji commented Jul 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jul 16, 2025

Uh oh!

eicherseiji commented Jul 16, 2025

Uh oh!

mergify bot commented Jul 21, 2025

Uh oh!

ruisearch42 commented Jul 21, 2025

Uh oh!

njhill commented Jul 21, 2025

Uh oh!

eicherseiji commented Jul 22, 2025

Uh oh!

ruisearch42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ptovam commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eicherseiji commented Jul 28, 2025

Uh oh!

ptovam commented Jul 28, 2025

Uh oh!

njhill commented Aug 8, 2025

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eicherseiji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eicherseiji left a comment

Choose a reason for hiding this comment

Uh oh!

[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list #20952

[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list #20952

eicherseiji commented Jul 14, 2025 •

edited by github-actions bot

Loading

ptovam commented Jul 28, 2025 •

edited

Loading