feat: add HTTP queue metrics for NIM frontend request tracking #2914

keivenchang · 2025-09-06T03:08:03Z

Overview:

This PR implements HTTP queue metrics for frontend request tracking to monitor request processing pipeline performance and detect engine pool exhaustion.

Details:

Add new HTTP queue gauge metric to track requests waiting in HTTP processing queue
Implement HttpQueueGuard RAII object for automatic queue time tracking from HTTP handler start to metrics processing
Integrate HTTP queue tracking in OpenAI chat completions handler
Add HTTP_QUEUE prometheus metric name constant
Track time from request arrival until engine processing begins to identify bottlenecks

Where should the reviewer start?

lib/llm/src/http/service/metrics.rs - New HTTP queue gauge and HttpQueueGuard implementation
lib/llm/src/http/service/openai.rs - Integration of HTTP queue tracking in chat completions
lib/runtime/src/metrics/prometheus_names.rs - New metric name constant

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to DIS-564, DIS-563

Summary by CodeRabbit

New Features
- Added an HTTP queue metric to monitor the number of requests waiting before processing, exposed via Prometheus with per-model labeling.
- Improved timing accuracy for request lifecycle metrics.
- Instrumentation updates in chat completions ensure queue time is measured until processing begins, including streaming and non-streaming paths.
Documentation
- Clarified the definition of the inflight requests metric to indicate it covers requests sent to the engine (e.g., vLLM, SGLang).

coderabbitai · 2025-09-06T03:14:22Z

Walkthrough

Adds a new HTTP queue gauge metric and RAII guard to track queued time before processing, wires it into Metrics, and instruments chat_completions to create and drop the guard at appropriate points. Introduces a new Prometheus metric name and minor doc tweaks. Derives PartialEq for Status and slightly adjusts inflight duration measurement.

Changes

Cohort / File(s)	Summary
HTTP metrics: queue gauge & guard `lib/llm/src/http/service/metrics.rs`	Adds http_queue_gauge to Metrics; introduces public HttpQueueGuard with RAII increment/decrement; exposes create_http_queue_guard(…) on Metrics; derives PartialEq for Status; adjusts InflightGuard drop to use a computed duration variable.
OpenAI handler instrumentation `lib/llm/src/http/service/openai.rs`	Creates HttpQueueGuard at chat_completions start; drops it immediately before processing begins in both streaming and non‑streaming paths; removes a trace log.
Prometheus metric name `lib/runtime/src/metrics/prometheus_names.rs`	Adds frontend_service::HTTP_QUEUE constant and description; clarifies INFLIGHT_REQUESTS docs.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Client
    participant H as HTTP Handler (chat_completions)
    participant M as Metrics (HttpQueueGauge)
    participant E as Engine/Processing

    C->>H: POST /chat/completions
    H->>M: create_http_queue_guard(model)
    Note right of M: Gauge++ (queued request)

    H-->>H: Waits / prepares request

    alt Streaming
        H->>M: drop(HttpQueueGuard)
        Note right of M: Gauge-- (processing begins)
        H->>E: Start streaming generation
        E-->>H: Stream tokens
        H-->>C: Stream response
    else Non-streaming
        H->>M: drop(HttpQueueGuard)
        Note right of M: Gauge-- (processing begins)
        H->>E: Generate completion
        E-->>H: Completion
        H-->>C: JSON response
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* #2176 — Also modifies lib/llm/src/http/service/metrics.rs, changing metric prefixes and Metrics::new; likely intersects with the new http_queue metric wiring.

Poem

I queued with a twitch of a whisker thin,
Counting requests with a gentle grin.
Gauge goes up, then hops back down—
When work begins, I’m out of town.
Metrics nibble, carrots tall,
Streaming or not, I watch them all. 🥕🐇

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (4)

lib/runtime/src/metrics/prometheus_names.rs (1)
48-53: Clarify HTTP_QUEUE semantics (and consider name)

Current doc suggests “waiting in HTTP queue before receiving a response,” but the guard is dropped right before stream processing begins, after engine.generate() returns. Clarify that it tracks “from handler entry until engine stream is ready.” Optionally consider http_queue_depth for Prometheus naming clarity; otherwise keep as-is but tighten the help text elsewhere.

Apply doc tweak:
-    /// Number of requests waiting in HTTP queue before receiving a response.
-    /// This can measure the engine pool exhaustion if the engine is not able to process requests fast enough.
+    /// Number of requests waiting in the frontend HTTP queue (from handler entry until the engine stream is ready).
+    /// Useful to detect engine pool exhaustion if the engine cannot accept work promptly.
     pub const HTTP_QUEUE: &str = "http_queue";
lib/llm/src/http/service/metrics.rs (3)
29-34: RAII guard: document invariants and consider Debug derive

Add a brief note that the guard must not be cloned/moved across processes and that model label is lowercased/sanitized upstream. Deriving Debug helps during tracing without impacting behavior.
-pub struct HttpQueueGuard {
+#[derive(Debug)]
+pub struct HttpQueueGuard {
145-153: Gauge help string: align with actual lifecycle

Help says “in HTTP processing queue,” but drop occurs when stream is about to be processed. Suggest tightening wording to avoid confusion with inflight metric.
-                "Number of requests in HTTP processing queue",
+                "Requests waiting from HTTP handler entry until engine stream is ready",
335-339: Public factory is fine; add tiny guard-rail for empty model

If upstream ever passes an empty model, we’ll emit series with model="". Add a debug assert to catch this early.
 pub fn create_http_queue_guard(self: Arc<Self>, model: &str) -> HttpQueueGuard {
+    debug_assert!(!model.is_empty(), "model label should not be empty");
     HttpQueueGuard::new(self, model.to_string().to_lowercase())
 }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1477f6e and 152131a.

📒 Files selected for processing (3)

lib/llm/src/http/service/metrics.rs (8 hunks)
lib/llm/src/http/service/openai.rs (3 hunks)
lib/runtime/src/metrics/prometheus_names.rs (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (2)

lib/llm/src/http/service/openai.rs (1)

lib/llm/src/http/service/metrics.rs (3)

drop (351-354)

drop (387-408)

drop (512-518)

lib/llm/src/http/service/metrics.rs (2)

lib/llm/src/http/service/service_v2.rs (1)

new (74-86)

lib/runtime/src/pipeline/context.rs (1)

registry (257-259)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Build and Test - vllm
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (9)

lib/llm/src/http/service/metrics.rs (7)

21-21: New gauge field looks good

Field added and used consistently.

75-75: Deriving PartialEq on Status is fine

Helpful for tests and comparisons.

285-292: Inc/dec helpers are correct

Label cardinality (per model) matches existing metrics.

293-301: Registered with Registry

Good—no orphaned metric vectors.

341-348: Constructor order is good

Increment first; state lives in guard. No issues.

350-355: Drop semantics correct

Ensures decrement on all early-returns.

388-408: Micro-optimization to store duration

Good change; avoids double Clock reads.

lib/llm/src/http/service/openai.rs (2)

550-550: Drop point for streaming path is appropriate

Ends “queue wait” just before arming SSE. Works with the revised creation point.

566-566: Drop point for non-streaming path is appropriate

Symmetric with streaming branch.

lib/llm/src/http/service/openai.rs

lib/llm/src/http/service/metrics.rs

nnshah1

LGTM - a couple thoughts:

is there another spot for the response functions? maybe response_handler.rs - having them in metrics.rs seems a little unintuitive.
could we make the names shorter? (not sure what would work here) but maybe a single method with flags for the behavior? not sure that makes sense.

lib/runtime/src/metrics/prometheus_names.rs

lib/llm/src/grpc/service/openai.rs

lib/runtime/src/metrics/prometheus_names.rs

Signed-off-by: Keiven Chang <[email protected]>

- Move http_queue_guard creation to beginning of request handlers - Move inflight_guard creation after engine.generate() call - Rename processing functions for clarity (process_streaming_with_*) Signed-off-by: Keiven Chang <[email protected]>

- Move InflightGuard creation before engine.generate() calls for proper error counting - Move EventConverter and related functions from openai.rs to metrics.rs - Rename functions: - process_streaming_with_metrics -> process_response_and_observe_metrics - process_response_using_streaming_event_converter -> process_response_using_event_converter_and_observe_metrics - Consolidate get_parsing_options function in metrics.rs - Update grpc/service/openai.rs to use shared metrics functions - Add documentation for HTTP queue vs inflight metrics Signed-off-by: Keiven Chang <[email protected]>

Ensures consistent model labels across all metrics for the same request. Signed-off-by: Keiven Chang <[email protected]>

- Move get_parsing_options from grpc service to http metrics module - Rename HTTP_QUEUE to HTTP_QUEUED_REQUESTS for clarity - Update comment to specify 'first response' timing Signed-off-by: Keiven Chang <[email protected]>

- Resolved merge conflict in lib/llm/src/http/service/metrics.rs by combining client_disconnect_gauge and http_queue_gauge fields - Updated metric names to use consistent naming convention (added _total suffix) - Moved detailed metrics documentation to deploy/metrics/README.md - Added comprehensive documentation for HTTP queue vs inflight metrics distinction - Updated prometheus_names.rs with centralized metric name constants Signed-off-by: Keiven Chang <[email protected]>

- Move get_parsing_options function from lib/llm/src/http/service/metrics.rs to lib/llm/src/discovery/model_manager.rs as a method - Update all call sites in openai.rs and kserve.rs to use the new ModelManager method - Remove the old function from metrics.rs - Improves code organization by placing parsing options functionality with ModelManager where it belongs - All tests pass, functionality remains unchanged Signed-off-by: Keiven Chang <[email protected]>

- Add comprehensive naming convention guidelines with prefix/name/suffix structure - Include common transformations from vague suffixes to specific ones - Provide clear examples of correct vs incorrect naming patterns - Document avoidance of vague suffixes like _counter, _gauge, _time, _size, _rate - Add visual indicators (❌/✅) for easy reference - Ensure consistency with Prometheus best practices Signed-off-by: Keiven Chang <[email protected]>

Signed-off-by: Keiven Chang <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>

keivenchang requested a review from a team as a code owner September 6, 2025 03:08

pull-request-size bot added the size/M label Sep 6, 2025

github-actions bot added the feat label Sep 6, 2025

coderabbitai bot reviewed Sep 6, 2025

View reviewed changes

lib/llm/src/http/service/openai.rs Outdated Show resolved Hide resolved

pull-request-size bot added size/L and removed size/M labels Sep 9, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 9, 2025 02:15 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 9, 2025 02:16 Inactive

pull-request-size bot added size/XL and removed size/L labels Sep 9, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 9, 2025 22:52 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 9, 2025 22:59 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 10, 2025 01:45 Inactive

keivenchang self-assigned this Sep 10, 2025

keivenchang requested review from nnshah1, rmccorm4 and ryan-lempka September 10, 2025 16:36

nnshah1 reviewed Sep 10, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

nnshah1 reviewed Sep 10, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

nnshah1 reviewed Sep 10, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

nnshah1 reviewed Sep 10, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

nnshah1 approved these changes Sep 10, 2025

View reviewed changes

grahamking reviewed Sep 10, 2025

View reviewed changes

lib/runtime/src/metrics/prometheus_names.rs Outdated Show resolved Hide resolved

grahamking reviewed Sep 10, 2025

View reviewed changes

lib/llm/src/grpc/service/openai.rs Show resolved Hide resolved

nnshah1 reviewed Sep 14, 2025

View reviewed changes

lib/runtime/src/metrics/prometheus_names.rs Outdated Show resolved Hide resolved

feat: add HTTP queue metrics for frontend request tracking

7a070c7

Signed-off-by: Keiven Chang <[email protected]>

keivenchang added 5 commits September 15, 2025 19:41

fix: move http_queue_guard creation after template resolution

2707abe

Ensures consistent model labels across all metrics for the same request. Signed-off-by: Keiven Chang <[email protected]>

keivenchang force-pushed the keivenchang/DIS-564__DIS-563__NIM-add-FE-metrics-part1 branch from 1244cd3 to 2d02d3c Compare September 15, 2025 19:49

keivenchang requested review from atchernych, biswapanda, hhzhang16, hutm, ishandhanani, julienmancuso and mohammedabdulwahhab as code owners September 15, 2025 19:49

copy-pr-bot bot temporarily deployed to GITLAB September 15, 2025 19:49 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 15, 2025 19:51 Inactive

grahamking approved these changes Sep 15, 2025

View reviewed changes

keivenchang mentioned this pull request Sep 16, 2025

refactor: standardize Prometheus metric naming conventions (part 1) #3035

Merged

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 00:43 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 00:44 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 01:05 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 16, 2025 01:06 Inactive

keivenchang merged commit 9fa5450 into main Sep 16, 2025
15 of 17 checks passed

keivenchang deleted the keivenchang/DIS-564__DIS-563__NIM-add-FE-metrics-part1 branch September 16, 2025 17:13

kmkelle-nv pushed a commit that referenced this pull request Sep 17, 2025

feat: add HTTP queue metrics for NIM frontend request tracking (#2914)

92ba2ed

Signed-off-by: Keiven Chang <[email protected]> Signed-off-by: Kristen Kelleher <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add HTTP queue metrics for NIM frontend request tracking #2914

feat: add HTTP queue metrics for NIM frontend request tracking #2914

Uh oh!

keivenchang commented Sep 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 6, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnshah1 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add HTTP queue metrics for NIM frontend request tracking #2914

feat: add HTTP queue metrics for NIM frontend request tracking #2914

Uh oh!

Conversation

keivenchang commented Sep 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 6, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nnshah1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

keivenchang commented Sep 6, 2025 •

edited by coderabbitai bot

Loading