[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None #15755

lengrongfu · 2025-03-29T15:11:17Z

Test result:

github-actions · 2025-03-29T15:11:25Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

mergify · 2025-03-29T15:11:54Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lengrongfu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

lengrongfu · 2025-03-31T02:51:41Z

@WoosukKwon can you take a look? thanks ~

markmc · 2025-03-31T17:03:08Z

Great catch! Yes indeed, num_gpu_blocks isn't currently available in the frontend, so we need some way of getting it from the engine

We currently return gpu_cache_usage in SchedulerStats:

@dataclass
class SchedulerStats:
    gpu_cache_usage: float = 0.0

it's tempting to include num_gpu_blocks there, even though it never changes. The overhead is small and the integration is easy!

Also, looking at the way BlockPool.get_usage() works, we could easily do:

@dataclass
class GPUCacheStats:
    num_blocks: float = 0.0
    num_free_blocks: float = 0.0

@dataclass
class SchedulerStats:
   gpu_cache_stats: GPUCacheStats = field(default_factory=GPUCacheStats)

It's easy to imagine we will want to add more to this over time

lengrongfu · 2025-04-01T06:43:20Z

Great catch! Yes indeed, num_gpu_blocks isn't currently available in the frontend, so we need some way of getting it from the engine

We currently return gpu_cache_usage in SchedulerStats:
@dataclass
class SchedulerStats:
    gpu_cache_usage: float = 0.0
it's tempting to include num_gpu_blocks there, even though it never changes. The overhead is small and the integration is easy!

Also, looking at the way BlockPool.get_usage() works, we could easily do:
@dataclass
class GPUCacheStats:
    num_blocks: float = 0.0
    num_free_blocks: float = 0.0

@dataclass
class SchedulerStats:
   gpu_cache_stats: GPUCacheStats = field(default_factory=GPUCacheStats)
It's easy to imagine we will want to add more to this over time

@WoosukKwon About this suggest what do you think?

mergify · 2025-04-01T17:47:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lengrongfu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

lengrongfu · 2025-04-07T13:42:05Z

@markmc PTAL

markmc

To be clear, I don't love including this unchanging information in SchedulerStats - but it is low overhead

@njhill is there any implications from #15977 we should consider? This is basically information about the engine that is only available to the frontend once the engine has finished initializing

vllm/v1/core/sched/scheduler.py

vllm/v1/engine/async_llm.py

vllm/v1/metrics/loggers.py

vllm/v1/engine/core.py

vllm/engine/protocol.py

njhill

I haven't reviewed closely, just added a few comments of things that I noticed.

@njhill is there any implications from #15977 we should consider? This is basically information about the engine that is only available to the frontend once the engine has finished initializing

@markmc it would probably be better to return this in the "ready" message that the engine sends to the front-end here once startup is complete, if possible.

vllm/engine/multiprocessing/client.py

vllm/v1/core/sched/output.py

vllm/v1/core/sched/scheduler.py

vllm/v1/engine/async_llm.py

markmc · 2025-04-08T06:11:16Z

I haven't reviewed closely, just added a few comments of things that I noticed.

@njhill is there any implications from #15977 we should consider? This is basically information about the engine that is only available to the frontend once the engine has finished initializing

@markmc it would probably be better to return this in the "ready" message that the engine sends to the front-end here once startup is complete, if possible.

Yeah, thanks. Just need to expand that I guess ...

This is the two side of that:

vllm/vllm/v1/engine/core.py

Lines 488 to 489 in f6b32ef

    
           # Send ready message to front-end once input socket is connected. 
        
           socket.send(b'READY')

vllm/vllm/v1/engine/core_client.py

Lines 433 to 434 in f6b32ef

    
           if msg != b'READY': 
        
               raise RuntimeError(f"Engine {eng_id} failed: {msg.decode()}")

Either extend this binary protocol with an int:

message = b'READY' + struct.pack('!I', vllm_config.cache_config.num_gpu_blocks)

...
header = data[:5]
num_gpu_blocks = struct.unpack('!I', data[5:])[0]
if header != b'READY':
    ...
vllm_config.cache_config.num_gpu_blocks = num_gpu_blocks

or do it with JSON:

message_dict = {
    'type': 'READY',
    'num_gpu_blocks': vllm_config.cache_config.num_gpu_blocks,
}

message = json.dumps(message_dict).encode('utf-8')
....

data = socket.recv(1024)
message_dict = json.loads(data.decode('utf-8'))

if message_dict['type'] == 'READY':
    vllm_config.cache_config.num_gpu_blocks = message_dict['num_gpu_blocks']

njhill · 2025-04-08T16:21:43Z

Thanks @markmc, yes I was thinking something like the json thing. I think json dumps/loads can be used without the string encode/decode though. Also when there's more than one engine we may want to record the prometheus metric with a label for the engine index.

DarkLight1337 · 2025-04-27T13:26:41Z

Can you update this PR with main? Then I can stamp it

lengrongfu · 2025-04-27T13:29:16Z

Ok， I will update in the later.

mergify · 2025-04-27T13:34:17Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @lengrongfu.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

DarkLight1337 · 2025-04-28T08:30:14Z

Looks like some tests are failing consistently, PTAL

lengrongfu · 2025-04-30T01:52:51Z

Looks like some tests are failing consistently, PTAL

Hi, ci pipeline tests is success. @DarkLight1337

vllm/v1/metrics/loggers.py

Signed-off-by: rongfu.leng <[email protected]>

lengrongfu · 2025-04-30T10:19:22Z

Please again take a look @DarkLight1337

njhill · 2025-05-01T15:53:22Z

I think with these changes, in the data parallel case (when there are multiple engines), the metrics will still only reflect the num_gpu_blocks from one of the engines. Typically the values would be similar across engines but we may want to consider whether they should be published individually with labels or aggregated in some other way.

…ject#15755) Signed-off-by: rongfu.leng <[email protected]>

…ject#15755) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Mu Huai <[email protected]>

…ject#15755) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

lengrongfu requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners March 29, 2025 15:11

mergify bot added the frontend label Mar 29, 2025

mergify bot added v1 needs-rebase labels Mar 29, 2025

lengrongfu force-pushed the fix/v1-metric branch from 567ed8b to 03a6d83 Compare March 29, 2025 15:13

mergify bot removed the needs-rebase label Mar 29, 2025

lengrongfu changed the title ~~[FIX]: vllm v1 verison metric num_gpu_blocks is None~~ [V1][FIX]: vllm v1 verison metric num_gpu_blocks is None Mar 29, 2025

lengrongfu changed the title ~~[V1][FIX]: vllm v1 verison metric num_gpu_blocks is None~~ [V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None Mar 29, 2025

lengrongfu force-pushed the fix/v1-metric branch 2 times, most recently from 43214b6 to 0f51eb5 Compare March 29, 2025 16:01

lengrongfu mentioned this pull request Mar 29, 2025

[Usage]: How to get "num_gpu_blocks" in V1？ #15538

Closed

1 task

mergify bot added the needs-rebase label Apr 1, 2025

lengrongfu force-pushed the fix/v1-metric branch from 0f51eb5 to a6110b5 Compare April 7, 2025 13:40

mergify bot removed the needs-rebase label Apr 7, 2025

markmc reviewed Apr 7, 2025

View reviewed changes

njhill reviewed Apr 8, 2025

View reviewed changes

lengrongfu force-pushed the fix/v1-metric branch from 644e2a2 to 66b350f Compare April 17, 2025 08:12

markmc approved these changes Apr 17, 2025

View reviewed changes

mergify bot added the needs-rebase label Apr 27, 2025

lengrongfu force-pushed the fix/v1-metric branch from 66b350f to 5425e91 Compare April 27, 2025 17:01

mergify bot removed the needs-rebase label Apr 27, 2025

DarkLight1337 approved these changes Apr 28, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) April 28, 2025 02:36

auto-merge was automatically disabled April 28, 2025 09:50
Head branch was pushed to by a user without write access

lengrongfu force-pushed the fix/v1-metric branch 3 times, most recently from 218a66e to d67b894 Compare April 29, 2025 14:53

DarkLight1337 reviewed Apr 30, 2025

View reviewed changes

vllm/v1/metrics/loggers.py Outdated Show resolved Hide resolved

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None

7effeab

Signed-off-by: rongfu.leng <[email protected]>

lengrongfu force-pushed the fix/v1-metric branch from d67b894 to 7effeab Compare April 30, 2025 02:43

DarkLight1337 merged commit d803786 into vllm-project:main Apr 30, 2025
43 checks passed

radeksm pushed a commit to radeksm/vllm that referenced this pull request May 2, 2025

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (vllm-pro…

2df064a

…ject#15755) Signed-off-by: rongfu.leng <[email protected]>

njhill mentioned this pull request May 2, 2025

[BugFix] Fix --disable-log-stats in V1 server mode #17600

Merged

DamonFool mentioned this pull request May 7, 2025

[V1][Bugfix]: Fix vllm crash with --disable-log-stats #17778

Closed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (vllm-pro…

8f9e430

…ject#15755) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Mu Huai <[email protected]>

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None (vllm-pro…

c0f49e3

…ject#15755) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>

Uh oh!

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None #15755

[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None #15755

Uh oh!

Conversation

lengrongfu commented Mar 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 29, 2025

Uh oh!

mergify bot commented Mar 29, 2025

Uh oh!

lengrongfu commented Mar 31, 2025

Uh oh!

markmc commented Mar 31, 2025

Uh oh!

lengrongfu commented Apr 1, 2025

Uh oh!

mergify bot commented Apr 1, 2025

Uh oh!

lengrongfu commented Apr 7, 2025

Uh oh!

markmc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markmc commented Apr 8, 2025

Uh oh!

njhill commented Apr 8, 2025

Uh oh!

DarkLight1337 commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lengrongfu commented Apr 27, 2025

Uh oh!

mergify bot commented Apr 27, 2025

Uh oh!

DarkLight1337 commented Apr 28, 2025

Uh oh!

lengrongfu commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

lengrongfu commented Apr 30, 2025

Uh oh!

Uh oh!

njhill commented May 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lengrongfu commented Mar 29, 2025 •

edited by github-actions bot

Loading

DarkLight1337 commented Apr 27, 2025 •

edited

Loading

lengrongfu commented Apr 30, 2025 •

edited

Loading