[Bugfix] Fix incorrect kv cache metrics in grafana.json #27133

fangpings · 2025-10-17T23:28:42Z

Purpose

Current grafana.json is using an outdated prometheus metrics name gpu_cache_usage_perc. The lastest metrics is named kv_cache_usage_perc. See here

Test Plan

Verified in my own grafana dashboard

Test Result

Before

After

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-10-17T23:28:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

mergify · 2025-10-17T23:29:25Z

Documentation preview: https://vllm--27133.org.readthedocs.build/en/27133/

gemini-code-assist

Code Review

This pull request updates the grafana.json file to use the correct Prometheus metric name for GPU cache usage. The old metric name gpu_cache_usage_perc is replaced with the new metric name kv_cache_usage_perc. This change ensures that the Grafana dashboard accurately reflects the current metrics being exposed by vLLM.

gemini-code-assist · 2025-10-17T23:29:53Z

examples/online_serving/prometheus_grafana/grafana.json

            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
-          "expr": "vllm:gpu_cache_usage_perc{model_name=\"$model_name\"}",


The metric name vllm:gpu_cache_usage_perc is outdated and should be updated to vllm:kv_cache_usage_perc to align with the latest metrics. This discrepancy could lead to incorrect monitoring and alerting.

It's critical to ensure that monitoring dashboards use the correct metric names to provide accurate insights into system performance and resource utilization.

"expr": "vllm:kv_cache_usage_perc{model_name=\"$model_name\"}"

markmc · 2025-10-20T09:04:59Z

Thank you!

Just need to fix the DCO issue:

always include Signed-off-by: Author Name [email protected] in every commit message

Since you'll be making that update, would you mind also searching and replacing in examples/online_serving/dashboards/perses/ too? No need to test

Signed-off-by: Fangping Shi <[email protected]>

fangpings · 2025-10-20T16:56:10Z

Thank you!

Just need to fix the DCO issue:

always include Signed-off-by: Author Name [email protected] in every commit message

Since you'll be making that update, would you mind also searching and replacing in examples/online_serving/dashboards/perses/ too? No need to test

Thanks. I replaced occurrence under that folder too.

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]> Signed-off-by: Alberto Perdomo <[email protected]>

…o step_forward * 'step_forward' of https://github.com/raindaywhu/vllm: (148 commits) [Model] Add MoE support for NemotronH (vllm-project#25863) [Metrics] [KVConnector] Add connector prefix cache hit rate stats (vllm-project#26245) [CI] Reorganize entrypoints tests (vllm-project#27403) add SLA information into comparison graph for vLLM Benchmark Suite (vllm-project#25525) [CI/Build] Fix AMD CI: test_cpu_gpu.py (vllm-project#27388) [Bugfix] Fix args settings for guided decoding args (vllm-project#27375) [CI/Build] Fix Prithvi plugin test (vllm-project#27393) [Chore] Remove duplicate `has_` functions in vllm.utils (vllm-project#27372) [Model] Add num_cached_tokens for PoolingRequestOutput (vllm-project#27378) [V1][spec decode] return logprobs for spec decoding (vllm-project#26060) [CORE] Support Prefix Caching with Prompt Embeds (vllm-project#27219) [Bugfix][Core] running queue index leakage exception (vllm-project#26754) [Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project#27133) [Bugfix] Fix SLA tuner initialization (vllm-project#27355) [Bugfix] Fix deepseek-ocr multi-image inference and add `merge_by_field_config=True` with tensor schema support (vllm-project#27361) [MLA] Bump FlashMLA (vllm-project#27354) [Chore] Separate out system utilities from vllm.utils (vllm-project#27201) [BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (vllm-project#27128) [Feature] publisher default set zmq in kv_event config (vllm-project#26915) [Prefix Cache] Use LoRA name for consistent KV-cache block hashing (vllm-project#27211) ...

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]> Signed-off-by: 0xrushi <[email protected]>

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Oct 17, 2025

gemini-code-assist bot reviewed Oct 17, 2025

View reviewed changes

fix: kv cache panel in grafana.json

f3c5ca8

Signed-off-by: Fangping Shi <[email protected]>

fangpings force-pushed the fix_grafana branch from d98eae4 to f3c5ca8 Compare October 20, 2025 16:44

replace deprecated metrics under pserses

8906e86

Signed-off-by: Fangping Shi <[email protected]>

markmc approved these changes Oct 21, 2025

View reviewed changes

markmc mentioned this pull request Oct 22, 2025

[Bugfix] grafana.json: rename vllm:gpu_cache_usage_perc to vllm:kv_cache_usage_perc #27341

Closed

markmc requested a review from DarkLight1337 October 22, 2025 18:59

DarkLight1337 approved these changes Oct 23, 2025

View reviewed changes

vllm-bot merged commit 7e09410 into vllm-project:main Oct 23, 2025
5 checks passed

usberkeley pushed a commit to usberkeley/vllm that referenced this pull request Oct 23, 2025

[Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project…

7b301ad

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

kingsmad pushed a commit to kingsmad/vllm that referenced this pull request Oct 25, 2025

[Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project…

5f404c6

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

ilmarkov pushed a commit to neuralmagic/vllm that referenced this pull request Nov 7, 2025

[Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project…

8100af7

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[Bugfix] Fix incorrect kv cache metrics in grafana.json (vllm-project…

8895720

…#27133) Signed-off-by: Fangping Shi <[email protected]> Co-authored-by: Fangping Shi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fix incorrect kv cache metrics in grafana.json #27133

[Bugfix] Fix incorrect kv cache metrics in grafana.json #27133

Uh oh!

fangpings commented Oct 17, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

mergify bot commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 17, 2025

Uh oh!

markmc commented Oct 20, 2025

Uh oh!

fangpings commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Fix incorrect kv cache metrics in grafana.json #27133

[Bugfix] Fix incorrect kv cache metrics in grafana.json #27133

Uh oh!

Conversation

fangpings commented Oct 17, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

mergify bot commented Oct 17, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

markmc commented Oct 20, 2025

Uh oh!

fangpings commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fangpings commented Oct 17, 2025 •

edited by github-actions bot

Loading