-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None #15755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
This pull request has merge conflicts that must be resolved before it can be |
567ed8b to
03a6d83
Compare
43214b6 to
0f51eb5
Compare
|
@WoosukKwon can you take a look? thanks ~ |
|
Great catch! Yes indeed, We currently return it's tempting to include Also, looking at the way It's easy to imagine we will want to add more to this over time |
@WoosukKwon About this suggest what do you think? |
|
This pull request has merge conflicts that must be resolved before it can be |
0f51eb5 to
a6110b5
Compare
|
@markmc PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I don't love including this unchanging information in SchedulerStats - but it is low overhead
@njhill is there any implications from #15977 we should consider? This is basically information about the engine that is only available to the frontend once the engine has finished initializing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed closely, just added a few comments of things that I noticed.
@njhill is there any implications from #15977 we should consider? This is basically information about the engine that is only available to the frontend once the engine has finished initializing
@markmc it would probably be better to return this in the "ready" message that the engine sends to the front-end here once startup is complete, if possible.
Yeah, thanks. Just need to expand that I guess ... This is the two side of that: Lines 488 to 489 in f6b32ef
vllm/vllm/v1/engine/core_client.py Lines 433 to 434 in f6b32ef
Either extend this binary protocol with an int: or do it with JSON: |
|
Thanks @markmc, yes I was thinking something like the json thing. I think json dumps/loads can be used without the string encode/decode though. Also when there's more than one engine we may want to record the prometheus metric with a label for the engine index. |
644e2a2 to
66b350f
Compare
|
Can you update this PR with main? Then I can stamp it |
|
Ok, I will update in the later. |
|
This pull request has merge conflicts that must be resolved before it can be |
66b350f to
5425e91
Compare
|
Looks like some tests are failing consistently, PTAL |
Head branch was pushed to by a user without write access
218a66e to
d67b894
Compare
Hi, ci pipeline tests is success. @DarkLight1337 |
Signed-off-by: rongfu.leng <[email protected]>
d67b894 to
7effeab
Compare
|
Please again take a look @DarkLight1337 |
|
I think with these changes, in the data parallel case (when there are multiple engines), the metrics will still only reflect the |
…ject#15755) Signed-off-by: rongfu.leng <[email protected]>
…ject#15755) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Mu Huai <[email protected]>
…ject#15755) Signed-off-by: rongfu.leng <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
FIX #15719
Test result:
