feat: Base metrics: add generic ingress handler metrics #2090

keivenchang · 2025-07-24T05:10:57Z

Overview:

Adds generic ingress handler metrics with automatic profiling for request count, duration, bytes, and errors.

Details:

Implements IngressMetrics struct with automatic collection via Ingress::for_engine_with_metrics()
Adds request counters, duration histograms, byte tracking, and error counters with type labels
Enhances PushWorkHandler to collect metrics during request processing
Refactors system_metrics example to demonstrate simplified API
Updates router and kv_router components to use new metrics API

In Progress (not in this PR):

Add this for LLM Workers

Where should the reviewer start?

lib/runtime/src/pipeline/network.rs - Core IngressMetrics implementation
lib/runtime/src/pipeline/network/ingress/push_handler.rs - Metrics integration
lib/runtime/examples/system_metrics/README.md - Usage examples

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-299

Summary by CodeRabbit

New Features
- Introduced comprehensive Prometheus metrics and profiling for ingress handlers, enabling automatic tracking of request counts, durations, bytes, and error types.
- Added a new HTTP server information interface, allowing retrieval of server address and status.
- Expanded metrics registry to support new metric types, including IntCounterVec.
Improvements
- Simplified and refactored system metrics example to showcase generic ingress profiling with minimal code changes.
- Centralized configuration for default namespace, component, and endpoint identifiers.
Bug Fixes
- Prometheus scrape configuration updated to use the correct target port.
Documentation
- Rewrote and expanded the system metrics example documentation to provide a comprehensive guide for profiling and metrics usage.
Refactor
- Streamlined endpoint and ingress handler setup for improved modularity and maintainability.
- Consolidated metrics initialization and usage across components for consistency.

copy-pr-bot · 2025-07-24T05:11:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-07-24T05:17:44Z

Walkthrough

This set of changes introduces a generic Prometheus profiling system for ingress handlers, refactors endpoint and router wiring to enable automatic metrics integration, and adds new metrics types and accessors. The system_metrics example is overhauled to showcase these capabilities, and new HTTP server info accessors are added to the runtime. Documentation and configuration updates accompany the code changes.

Changes

File(s)	Change Summary
components/router/src/main.rs lib/llm/src/kv_router/publisher.rs	Refactored endpoint and router construction to enable automatic Prometheus metrics integration using `Ingress::for_engine_with_metrics`; reordered logic and updated handler creation to utilize endpoints before wrapping routers.
lib/runtime/examples/system_metrics/README.md	Rewrote and expanded documentation to describe the new generic ingress profiling system, automatic Prometheus metrics, usage of `Ingress::for_engine_with_metrics`, and benefits; removed old minimal example and added detailed instructions and examples.
lib/runtime/examples/system_metrics/src/lib.rs	Major refactor: Introduced custom system metrics with Prometheus integration, new handler and metrics structs, constants for default identifiers, and a reusable backend setup function. Implements a full example of metrics-enabled request handling.
lib/runtime/examples/system_metrics/src/bin/system_server.rs	Removed all custom metrics and handler logic; now delegates backend setup to a single function from the crate, drastically simplifying the main application logic.
lib/runtime/examples/system_metrics/src/bin/system_client.rs	Replaced string literals with constants for namespace, component, and endpoint identifiers to centralize configuration.
lib/runtime/examples/system_metrics/Cargo.toml	Added `[features]` and `[dev-dependencies]` sections for integration testing and development dependencies.
deploy/metrics/prometheus.yml	Changed Prometheus scrape target port for `llm-demo` job from 8000 to 8080.
lib/runtime/src/pipeline/network.rs	Added `IngressMetrics` struct and methods; extended `Ingress` with metrics field and new constructors/methods for metrics integration, enabling automatic Prometheus profiling for ingress handlers.
lib/runtime/src/pipeline/network/ingress/push_handler.rs	Instrumented the `handle_payload` method with detailed Prometheus metrics tracking for requests, errors, bytes, and durations, without altering core logic.
lib/runtime/src/metrics.rs	Added support for `IntCounterVec` metric type; extended registry trait and factory methods; updated documentation and parameter handling for vector metrics.
lib/runtime/src/distributed.rs	Added `http_server` field to `DistributedRuntime` to store HTTP server info; new accessor method for HTTP server details.
lib/runtime/src/http_server.rs	Introduced public `HttpServerInfo` struct with address and handle accessors, plus a `Clone` implementation.
lib/runtime/src/lib.rs	Imported and re-exported `HttpServerInfo`; added `http_server` field to `DistributedRuntime` struct.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Ingress
    participant Metrics
    participant Handler

    Client->>Ingress: Send request payload
    Ingress->>Metrics: Increment request counters, start timer
    Ingress->>Handler: Process request
    Handler-->>Ingress: Stream response(s)
    Ingress->>Metrics: Record bytes, errors, durations
    Ingress-->>Client: Send response(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

ai-dynamo/dynamo#2008: Introduces hierarchical MetricsRegistry and related infrastructure, directly enabling the automatic Prometheus metrics integration applied in this PR.

Poem

In fields of code where metrics bloom,
A rabbit hops and sweeps the room.
With Prometheus, requests are tracked,
Bytes and errors neatly stacked.
Endpoints wired, handlers sing—
Now every stat has found its spring!
🐇📊

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

deploy/metrics/prometheus.yml (1)
37-41: Update the comment to reflect the new port number.

The comment on line 37 still references port 8000, but the actual target port has been changed to 8080. This inconsistency could confuse users setting up the demo service.
-  # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp
+  # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8080/tcp
lib/runtime/src/pipeline/network.rs (1)
379-398: Consider more graceful error handling in with_metrics

The implementation is solid, but the unwrap() on line 381 could panic if metrics are somehow already set. While this is unlikely during normal initialization, consider using expect() with a descriptive message or returning a Result.
-    pub fn with_metrics(metrics: IngressMetrics) -> Arc<Self> {
-        let ingress = Self::new();
-        ingress.metrics.set(Arc::new(metrics)).unwrap();
-        ingress
-    }
+    pub fn with_metrics(metrics: IngressMetrics) -> Arc<Self> {
+        let ingress = Self::new();
+        ingress.metrics.set(Arc::new(metrics))
+            .expect("Metrics should not be already set when creating new Ingress");
+        ingress
+    }
lib/runtime/examples/system_metrics/src/lib.rs (2)
32-35: Consider documenting the purpose of MyStats struct

The MyStats struct appears to be used for the stats handler but its purpose and the meaning of the val field are unclear.
-#[derive(serde::Serialize, serde::Deserialize, Debug, Clone)]
-pub struct MyStats {
-    pub val: i32,
-}
+/// Stats structure returned by the endpoint's stats handler
+#[derive(serde::Serialize, serde::Deserialize, Debug, Clone)]
+pub struct MyStats {
+    /// Example value for demonstration purposes
+    pub val: i32,
+}
117-122: Consider making the stats handler more meaningful

The stats handler currently returns a fixed value which doesn't demonstrate real metrics. Consider returning actual system stats or metrics data to make the example more realistic.
-        .stats_handler(|_stats| {
-            println!("Stats handler called with stats: {:?}", _stats);
-            let stats = MyStats { val: 10 };
-            serde_json::to_value(stats).unwrap()
-        })
+        .stats_handler(|stats| {
+            println!("Stats handler called with stats: {:?}", stats);
+            // Return actual runtime stats or a more meaningful example
+            let my_stats = MyStats { 
+                val: stats.get("request_count").and_then(|v| v.as_i64()).unwrap_or(0) as i32
+            };
+            serde_json::to_value(my_stats).unwrap()
+        })

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13d3cc1 and ff666f5.

⛔ Files ignored due to path filters (1)

lib/runtime/examples/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (14)

components/router/src/main.rs (1 hunks)
deploy/metrics/prometheus.yml (1 hunks)
lib/llm/src/kv_router/publisher.rs (2 hunks)
lib/runtime/examples/system_metrics/Cargo.toml (2 hunks)
lib/runtime/examples/system_metrics/README.md (1 hunks)
lib/runtime/examples/system_metrics/src/bin/system_client.rs (2 hunks)
lib/runtime/examples/system_metrics/src/bin/system_server.rs (2 hunks)
lib/runtime/examples/system_metrics/src/lib.rs (1 hunks)
lib/runtime/src/distributed.rs (4 hunks)
lib/runtime/src/http_server.rs (1 hunks)
lib/runtime/src/lib.rs (3 hunks)
lib/runtime/src/metrics.rs (5 hunks)
lib/runtime/src/pipeline/network.rs (3 hunks)
lib/runtime/src/pipeline/network/ingress/push_handler.rs (7 hunks)

🧰 Additional context used

🧠 Learnings (10)

lib/runtime/src/lib.rs (4)

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

components/router/src/main.rs (2)

Learnt from: alec-flowers
PR: #1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions create_stored_blocks and create_stored_block_from_parts are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

lib/runtime/src/distributed.rs (5)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/runtime/examples/system_metrics/src/bin/system_server.rs (3)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

lib/runtime/examples/system_metrics/src/bin/system_client.rs (3)

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

lib/runtime/src/http_server.rs (1)

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

lib/runtime/examples/system_metrics/Cargo.toml (1)

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/llm/src/kv_router/publisher.rs (7)

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

Learnt from: alec-flowers
PR: #1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions create_stored_blocks and create_stored_block_from_parts are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

Learnt from: oandreeva-nv
PR: #1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets None. The start_batching_publisher function in lib/llm/tests/block_manager.rs demonstrates this pattern: when the KVBMDynamoRuntimeComponent is dropped, its batch_tx sender is dropped, causing rx.recv() to return None, which triggers cleanup and task termination.

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

lib/runtime/examples/system_metrics/src/lib.rs (3)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/runtime/src/pipeline/network/ingress/push_handler.rs (3)

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::error::RequestErrorKind::NoResponders. Use err.downcast_ref::<async_nats::error::RequestError>() and then check req_err.kind() against RequestErrorKind::NoResponders to handle this error properly.

Learnt from: t-ob
PR: #1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang async_encode method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.

Learnt from: oandreeva-nv
PR: #1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets None. The start_batching_publisher function in lib/llm/tests/block_manager.rs demonstrates this pattern: when the KVBMDynamoRuntimeComponent is dropped, its batch_tx sender is dropped, causing rx.recv() to return None, which triggers cleanup and task termination.

🧬 Code Graph Analysis (4)

lib/runtime/examples/system_metrics/src/bin/system_server.rs (2)

lib/bindings/python/src/dynamo/_core.pyi (1)

DistributedRuntime (30-53)

lib/runtime/examples/system_metrics/src/lib.rs (1)

backend (95-128)

lib/runtime/examples/system_metrics/src/bin/system_client.rs (2)

lib/runtime/src/component.rs (5)

component (346-348)

component (489-495)

namespace (209-211)

namespace (498-505)

client (423-429)

lib/runtime/src/distributed.rs (1)

namespace (173-175)

lib/runtime/src/http_server.rs (1)

lib/runtime/src/distributed.rs (1)

new (46-141)

lib/runtime/src/pipeline/network.rs (5)

lib/runtime/src/distributed.rs (1)

new (46-141)

lib/runtime/examples/system_metrics/src/lib.rs (2)

from_endpoint (44-56)

with_metrics (65-69)

lib/bindings/python/src/dynamo/_core.pyi (1)

Endpoint (212-236)

lib/runtime/src/pipeline/nodes/sinks/segment.rs (1)

attach (24-28)

lib/runtime/tests/common/mock.rs (1)

segment (305-309)

🔇 Additional comments (31)

lib/runtime/examples/system_metrics/src/bin/system_client.rs (2)

17-17: Good refactoring to use centralized constants.

The addition of DEFAULT_COMPONENT and DEFAULT_ENDPOINT to the imports improves maintainability by avoiding magic strings.

34-36: LGTM: Consistent use of configuration constants.

Replacing hardcoded strings with the imported constants (DEFAULT_COMPONENT and DEFAULT_ENDPOINT) improves maintainability and reduces the risk of typos.

lib/runtime/examples/system_metrics/Cargo.toml (2)

25-27: LGTM: Well-documented feature flags.

The feature separation allows for running tests without external dependencies, while the integration feature clearly indicates NATS requirement.

39-42: Appropriate dev dependencies for testing.

The selected dev dependencies (reqwest with JSON support, tokio-test, and rand) are well-suited for testing HTTP endpoints and async code.

lib/runtime/src/lib.rs (3)

23-23: LGTM: Adding OnceLock for lazy initialization.

The addition of OnceLock supports the new HTTP server info storage pattern.

40-40: Good API design: Re-exporting HttpServerInfo.

Making HttpServerInfo available at the crate root improves the public API ergonomics.

90-90: Consistent pattern: HTTP server field mirrors TCP server design.

The http_server field follows the same Arc<OnceLock<Arc<...>>> pattern as the existing tcp_server field, ensuring consistency in the codebase architecture.

components/router/src/main.rs (1)

69-82: Excellent integration of ingress metrics functionality.

The refactoring correctly integrates the new metrics capabilities by:

Creating the endpoint first

Using Ingress::for_engine_with_metrics instead of the basic for_engine method

Passing the endpoint reference to enable automatic metrics collection

This change aligns perfectly with the PR objectives of adding generic ingress handler metrics.

lib/runtime/src/distributed.rs (4)

27-27: LGTM: Standard library import

The import of std::sync::OnceLock is correctly added to support the new HTTP server information storage functionality.

95-95: LGTM: Proper initialization of thread-safe storage

The field initialization using Arc::new(OnceLock::new()) follows the same pattern as the existing tcp_server field and provides thread-safe, single-initialization semantics.

119-130: LGTM: Correct HTTP server information storage

The server information storage logic properly:

Captures both the socket address and task handle from spawn_http_server

Creates an HttpServerInfo instance with the captured data

Uses OnceLock::set() with appropriate error handling for single initialization

Maintains consistency with the existing server management patterns

218-221: LGTM: Clean accessor method implementation

The http_server_info() method provides a clean API to access the stored HTTP server information, returning Option<Arc<HttpServerInfo>> which properly handles the case where the HTTP server may not be initialized.

lib/runtime/src/pipeline/network/ingress/push_handler.rs (4)

26-32: LGTM: Proper request metrics initialization

The metrics collection is correctly initialized at the start of request processing, capturing:

Request start time for duration calculation

Request counter increment

Concurrent request tracking

Request byte size measurement

The conditional check ensures metrics are only collected when available.

52-56: Excellent error categorization for observability

The error metrics implementation provides detailed categorization:

deserialization: JSON parsing failures

invalid_message: Unexpected message format

response_stream: TCP connection issues

generate: Engine generation failures

This granular labeling will enable precise monitoring and debugging of different failure modes.

Also applies to: 66-70, 89-93, 105-108

138-140: LGTM: Response metrics and error tracking

The response handling correctly tracks:

Response byte sizes for both regular responses and completion markers

Publish failures with appropriate error labels (publish_response, publish_final)

This provides complete visibility into the data flow and potential transmission issues.

Also applies to: 145-149, 160-162, 168-170

174-178: LGTM: Proper request completion metrics

The request completion logic correctly:

Records the total request duration using the captured start time

Decrements the concurrent request counter

Maintains proper cleanup regardless of success/failure

This ensures accurate duration tracking and prevents metric leaks.

lib/runtime/src/http_server.rs (2)

22-22: LGTM: Required import for task handle

The import of tokio::task::JoinHandle is correctly added to support the optional task handle field in HttpServerInfo.

26-61: Well-designed HTTP server information struct

The HttpServerInfo implementation is clean and comprehensive:

Strengths:

Encapsulates both socket address and optional task handle

Constructor properly wraps JoinHandle in Arc for shared ownership

Provides convenient accessor methods for address components

Manual Clone implementation correctly handles the Arc<JoinHandle<()>> field

Design considerations:

The optional handle allows flexibility for different server management scenarios

The struct provides a clean abstraction for HTTP server runtime information

Consistent with the broader server management patterns in the codebase

lib/llm/src/kv_router/publisher.rs (3)

502-504: LGTM: Efficient endpoint reuse and metrics integration

The implementation correctly:

Obtains the endpoint once and reuses it for both handler creation and ingress setup

Integrates with the new metrics system using Ingress::for_engine_with_metrics

Passes the endpoint to both the handler constructor and the ingress factory

This is more efficient than the previous approach of calling component.endpoint() multiple times.

506-506: LGTM: Consistent endpoint usage

The endpoint builder correctly uses the pre-obtained endpoint, maintaining consistency with the metrics-enabled ingress handler.

520-521: LGTM: Future-proofed handler design

The addition of the endpoint field to KvLoadEndpoingHander is well-designed:

The #[allow(dead_code)] annotation acknowledges the field isn't used yet

The constructor properly accepts and stores the endpoint

This prepares the handler for potential future endpoint-based functionality

Also applies to: 525-530

lib/runtime/examples/system_metrics/src/bin/system_server.rs (2)

16-17: LGTM: Simplified imports reflect refactored architecture

The imports are now minimal and focused:

Standard runtime components are imported directly

The backend function is imported from the system_metrics crate library

Removes all the custom metrics and handler imports that were moved to the library

27-27: Excellent demonstration of the simplified API

The single call to backend(distributed, None) beautifully demonstrates the new generic profiling system:

No manual metrics setup required

No custom handler implementation needed

The complexity is properly abstracted away in the library

Shows how easy it is to add comprehensive metrics to an ingress handler

This exemplifies the PR's goal of providing "automatic profiling capabilities" for ingress handlers.

lib/runtime/examples/system_metrics/README.md (1)

1-147: Excellent documentation for the new ingress metrics system!

The README provides comprehensive coverage of the new IngressMetrics feature with clear examples, benefits, and usage instructions. The structure is logical and the examples effectively demonstrate both automatic profiling and custom metrics integration.

lib/runtime/src/pipeline/network.rs (2)

48-128: Well-structured metrics implementation with comprehensive coverage!

The IngressMetrics struct provides a clean abstraction for ingress profiling with all essential metrics (requests, duration, concurrency, bytes, errors). The from_endpoint factory method elegantly leverages the endpoint's built-in labeling system.

425-437: Clean and intuitive API for metrics-enabled ingress creation!

The for_engine_with_metrics method provides an excellent developer experience by combining engine setup with automatic metrics configuration in a single call. The helper metrics() accessor is also well-designed for optional metrics access.

lib/runtime/src/metrics.rs (3)

123-136: Consistent implementation of IntCounterVec support!

The PrometheusMetric trait implementation for IntCounterVec correctly follows the established pattern used by other vector metric types.

285-299: Proper integration of IntCounterVec in the metric creation flow!

The handling of IntCounterVec in create_metric correctly validates parameters and follows the established pattern for vector metric types.

402-417: Breaking change: create_countervec signature updated

The create_countervec method now requires two separate parameters—

const_labels: &[&str]

const_label_values: &[(&str, &str)]

This improves consistency with other vector metrics but is a breaking API change for any external callers. Our search found only the definition and an internal example in lib/runtime/src/metrics.rs, so there are no remaining internal call sites to update:

Definition: lib/runtime/src/metrics.rs (lines 402–417)

Internal example call: lib/runtime/src/metrics.rs (example in docs/tests)

Please review and update any external consumers of this crate to match the new signature.

lib/runtime/examples/system_metrics/src/lib.rs (2)

37-57: Well-designed custom metrics struct!

The MySystemStatsMetrics implementation effectively demonstrates how to create custom metrics using the endpoint's built-in metric registry. The use of Arc<IntCounter> ensures thread-safe access.

72-91: Excellent example of integrating custom metrics into a handler!

The RequestHandler implementation clearly demonstrates how to incorporate custom metrics tracking alongside the automatic ingress metrics. The optional metrics pattern allows the handler to work with or without metrics.

lib/runtime/examples/system_metrics/README.md

tedzhouhk · 2025-07-24T18:51:58Z

great work! hope the examples can have it soon!

deploy/metrics/prometheus.yml

lib/llm/src/kv_router/publisher.rs

lib/runtime/examples/system_metrics/src/lib.rs

lib/runtime/src/pipeline/network.rs

lib/runtime/examples/system_metrics/README.md

lib/runtime/src/pipeline/network/ingress/push_handler.rs

rmccorm4

_

rmccorm4

LGTM at a high level - would benefit from a more detailed review on the code changes. CC @paulhendricks @ryanolson @kthui @oandreeva-nv

lib/runtime/src/http_server.rs

lib/runtime/src/pipeline/network/ingress/push_handler.rs

ryanolson

overall - great addition! this will be really useful. thank you!

let's continue to discuss the default naming conventions and how users can select different variants.

it would be a good place to really start solidifying our configuration details using figment to parse, merge and join files/envs/cli-args, etc.

…sh Prometheus names

addressed

whoisj

I think some test code survived and should be removed before this is merged.

lib/runtime/src/http_server.rs

biswapanda · 2025-07-28T19:31:15Z

Lgtm

keivenchang requested review from a team, biswapanda, hhzhang16, hutm, ishandhanani, julienmancuso, mohammedabdulwahhab and nnshah1 as code owners July 24, 2025 05:10

pull-request-size bot added the size/XL label Jul 24, 2025

github-actions bot added the feat label Jul 24, 2025

coderabbitai bot reviewed Jul 24, 2025

View reviewed changes

keivenchang force-pushed the keivenkchang/Observability_DIS-299__base-metrics-add-generic-requesthandler-metrics branch from ff666f5 to d9d9420 Compare July 24, 2025 15:40

tedzhouhk reviewed Jul 24, 2025

View reviewed changes

lib/runtime/examples/system_metrics/README.md Outdated Show resolved Hide resolved

whoisj previously requested changes Jul 24, 2025

View reviewed changes

deploy/metrics/prometheus.yml Show resolved Hide resolved

lib/llm/src/kv_router/publisher.rs Outdated Show resolved Hide resolved

lib/runtime/examples/system_metrics/src/lib.rs Outdated Show resolved Hide resolved

rmccorm4 reviewed Jul 24, 2025

View reviewed changes

lib/runtime/src/pipeline/network.rs Outdated Show resolved Hide resolved

pull-request-size bot added size/XXL and removed size/XL labels Jul 24, 2025

keivenchang requested review from indrajit96 and krishung5 as code owners July 25, 2025 04:44

keivenchang requested a review from whoisj July 25, 2025 17:47

keivenchang mentioned this pull request Jul 25, 2025

fix: move docker-compose.yml to deploy/, and update frontend port #2121

Merged

rmccorm4 reviewed Jul 25, 2025

View reviewed changes

lib/runtime/examples/system_metrics/README.md Show resolved Hide resolved

rmccorm4 reviewed Jul 25, 2025

View reviewed changes

lib/runtime/src/pipeline/network/ingress/push_handler.rs Show resolved Hide resolved

rmccorm4 reviewed Jul 25, 2025

View reviewed changes

rmccorm4 approved these changes Jul 25, 2025

View reviewed changes

keivenchang added 2 commits July 25, 2025 18:37

feat: Base metrics: add generic ingress handler metrics

f49488c

resolve CI workspace dependency version error

b64962f

keivenchang added 9 commits July 25, 2025 18:37

Update dependencies, and run cargo fmt

e620ab1

Fix with cargo check --manifest-path ...

e64a977

Add add_metrics in PushWorkHandler

90ee2cf

Refactor, remove ingress_* prefix.

f8eafe0

Update README.md

b14b3a4

More clean up of code and nitpicks

d137db7

Update ports, use 8081 for the backend, updated README.md

1dbd65b

Update Mermaid graph to include Dynamo backend

c550e55

Update README.md for more clarity

3b3b874

keivenchang force-pushed the keivenkchang/Observability_DIS-299__base-metrics-add-generic-requesthandler-metrics branch from 23909fb to 3b3b874 Compare July 26, 2025 01:44

ryanolson reviewed Jul 26, 2025

View reviewed changes

lib/runtime/src/http_server.rs Show resolved Hide resolved

ryanolson reviewed Jul 26, 2025

View reviewed changes

lib/runtime/src/pipeline/network/ingress/push_handler.rs Show resolved Hide resolved

ryanolson approved these changes Jul 26, 2025

View reviewed changes

keivenchang added 3 commits July 25, 2025 22:08

Add a namespace_ prefix (dynamo_) to metric names to better distingui…

ac7559b

…sh Prometheus names

Add a lint_prometheus_name because some tests use invalid namespaces.

17b9366

Fix the hierarchy labeling for namespace

66b69de

nnshah1 approved these changes Jul 28, 2025

View reviewed changes

whoisj requested changes Jul 28, 2025

View reviewed changes

lib/runtime/src/http_server.rs Show resolved Hide resolved

keivenchang requested a review from whoisj July 28, 2025 17:20

whoisj approved these changes Jul 28, 2025

View reviewed changes

keivenchang merged commit 615580d into main Jul 28, 2025
10 checks passed

keivenchang deleted the keivenkchang/Observability_DIS-299__base-metrics-add-generic-requesthandler-metrics branch July 28, 2025 17:35

This was referenced Aug 15, 2025

feat: router-level request rejection #2465

Merged

feat: Add model label for vllm backend metrics #2474

Merged

fix: guard inflight_requests and request_duration from early returns. #2576

Merged

feat: enable dynamo metrics on KVBM #2626

Merged

This was referenced Sep 18, 2025

feat: Sglang canary health check #3103

Merged

feat: local client #3052

Open

feat: Base metrics: add generic ingress handler metrics #2090

feat: Base metrics: add generic ingress handler metrics #2090

Uh oh!

Conversation

keivenchang commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Jul 24, 2025

Uh oh!

coderabbitai bot commented Jul 24, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tedzhouhk commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ryanolson left a comment

Choose a reason for hiding this comment

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

biswapanda commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

keivenchang commented Jul 24, 2025 •

edited

Loading

rmccorm4 left a comment •

edited

Loading

rmccorm4 left a comment •

edited

Loading