Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Jul 24, 2025

Overview:

Adds generic ingress handler metrics with automatic profiling for request count, duration, bytes, and errors.

Details:

  • Implements IngressMetrics struct with automatic collection via Ingress::for_engine_with_metrics()
  • Adds request counters, duration histograms, byte tracking, and error counters with type labels
  • Enhances PushWorkHandler to collect metrics during request processing
  • Refactors system_metrics example to demonstrate simplified API
  • Updates router and kv_router components to use new metrics API

In Progress (not in this PR):

  • Add this for LLM Workers

Where should the reviewer start?

  • lib/runtime/src/pipeline/network.rs - Core IngressMetrics implementation
  • lib/runtime/src/pipeline/network/ingress/push_handler.rs - Metrics integration
  • lib/runtime/examples/system_metrics/README.md - Usage examples

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

DIS-299

Summary by CodeRabbit

  • New Features

    • Introduced comprehensive Prometheus metrics and profiling for ingress handlers, enabling automatic tracking of request counts, durations, bytes, and error types.
    • Added a new HTTP server information interface, allowing retrieval of server address and status.
    • Expanded metrics registry to support new metric types, including IntCounterVec.
  • Improvements

    • Simplified and refactored system metrics example to showcase generic ingress profiling with minimal code changes.
    • Centralized configuration for default namespace, component, and endpoint identifiers.
  • Bug Fixes

    • Prometheus scrape configuration updated to use the correct target port.
  • Documentation

    • Rewrote and expanded the system metrics example documentation to provide a comprehensive guide for profiling and metrics usage.
  • Refactor

    • Streamlined endpoint and ingress handler setup for improved modularity and maintainability.
    • Consolidated metrics initialization and usage across components for consistency.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 24, 2025

Walkthrough

This set of changes introduces a generic Prometheus profiling system for ingress handlers, refactors endpoint and router wiring to enable automatic metrics integration, and adds new metrics types and accessors. The system_metrics example is overhauled to showcase these capabilities, and new HTTP server info accessors are added to the runtime. Documentation and configuration updates accompany the code changes.

Changes

File(s) Change Summary
components/router/src/main.rs
lib/llm/src/kv_router/publisher.rs
Refactored endpoint and router construction to enable automatic Prometheus metrics integration using Ingress::for_engine_with_metrics; reordered logic and updated handler creation to utilize endpoints before wrapping routers.
lib/runtime/examples/system_metrics/README.md Rewrote and expanded documentation to describe the new generic ingress profiling system, automatic Prometheus metrics, usage of Ingress::for_engine_with_metrics, and benefits; removed old minimal example and added detailed instructions and examples.
lib/runtime/examples/system_metrics/src/lib.rs Major refactor: Introduced custom system metrics with Prometheus integration, new handler and metrics structs, constants for default identifiers, and a reusable backend setup function. Implements a full example of metrics-enabled request handling.
lib/runtime/examples/system_metrics/src/bin/system_server.rs Removed all custom metrics and handler logic; now delegates backend setup to a single function from the crate, drastically simplifying the main application logic.
lib/runtime/examples/system_metrics/src/bin/system_client.rs Replaced string literals with constants for namespace, component, and endpoint identifiers to centralize configuration.
lib/runtime/examples/system_metrics/Cargo.toml Added [features] and [dev-dependencies] sections for integration testing and development dependencies.
deploy/metrics/prometheus.yml Changed Prometheus scrape target port for llm-demo job from 8000 to 8080.
lib/runtime/src/pipeline/network.rs Added IngressMetrics struct and methods; extended Ingress with metrics field and new constructors/methods for metrics integration, enabling automatic Prometheus profiling for ingress handlers.
lib/runtime/src/pipeline/network/ingress/push_handler.rs Instrumented the handle_payload method with detailed Prometheus metrics tracking for requests, errors, bytes, and durations, without altering core logic.
lib/runtime/src/metrics.rs Added support for IntCounterVec metric type; extended registry trait and factory methods; updated documentation and parameter handling for vector metrics.
lib/runtime/src/distributed.rs Added http_server field to DistributedRuntime to store HTTP server info; new accessor method for HTTP server details.
lib/runtime/src/http_server.rs Introduced public HttpServerInfo struct with address and handle accessors, plus a Clone implementation.
lib/runtime/src/lib.rs Imported and re-exported HttpServerInfo; added http_server field to DistributedRuntime struct.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Ingress
    participant Metrics
    participant Handler

    Client->>Ingress: Send request payload
    Ingress->>Metrics: Increment request counters, start timer
    Ingress->>Handler: Process request
    Handler-->>Ingress: Stream response(s)
    Ingress->>Metrics: Record bytes, errors, durations
    Ingress-->>Client: Send response(s)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~35 minutes

Possibly related PRs

  • ai-dynamo/dynamo#2008: Introduces hierarchical MetricsRegistry and related infrastructure, directly enabling the automatic Prometheus metrics integration applied in this PR.

Poem

In fields of code where metrics bloom,
A rabbit hops and sweeps the room.
With Prometheus, requests are tracked,
Bytes and errors neatly stacked.
Endpoints wired, handlers sing—
Now every stat has found its spring!
🐇📊

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
deploy/metrics/prometheus.yml (1)

37-41: Update the comment to reflect the new port number.

The comment on line 37 still references port 8000, but the actual target port has been changed to 8080. This inconsistency could confuse users setting up the demo service.

-  # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp
+  # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8080/tcp
lib/runtime/src/pipeline/network.rs (1)

379-398: Consider more graceful error handling in with_metrics

The implementation is solid, but the unwrap() on line 381 could panic if metrics are somehow already set. While this is unlikely during normal initialization, consider using expect() with a descriptive message or returning a Result.

-    pub fn with_metrics(metrics: IngressMetrics) -> Arc<Self> {
-        let ingress = Self::new();
-        ingress.metrics.set(Arc::new(metrics)).unwrap();
-        ingress
-    }
+    pub fn with_metrics(metrics: IngressMetrics) -> Arc<Self> {
+        let ingress = Self::new();
+        ingress.metrics.set(Arc::new(metrics))
+            .expect("Metrics should not be already set when creating new Ingress");
+        ingress
+    }
lib/runtime/examples/system_metrics/src/lib.rs (2)

32-35: Consider documenting the purpose of MyStats struct

The MyStats struct appears to be used for the stats handler but its purpose and the meaning of the val field are unclear.

-#[derive(serde::Serialize, serde::Deserialize, Debug, Clone)]
-pub struct MyStats {
-    pub val: i32,
-}
+/// Stats structure returned by the endpoint's stats handler
+#[derive(serde::Serialize, serde::Deserialize, Debug, Clone)]
+pub struct MyStats {
+    /// Example value for demonstration purposes
+    pub val: i32,
+}

117-122: Consider making the stats handler more meaningful

The stats handler currently returns a fixed value which doesn't demonstrate real metrics. Consider returning actual system stats or metrics data to make the example more realistic.

-        .stats_handler(|_stats| {
-            println!("Stats handler called with stats: {:?}", _stats);
-            let stats = MyStats { val: 10 };
-            serde_json::to_value(stats).unwrap()
-        })
+        .stats_handler(|stats| {
+            println!("Stats handler called with stats: {:?}", stats);
+            // Return actual runtime stats or a more meaningful example
+            let my_stats = MyStats { 
+                val: stats.get("request_count").and_then(|v| v.as_i64()).unwrap_or(0) as i32
+            };
+            serde_json::to_value(my_stats).unwrap()
+        })
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13d3cc1 and ff666f5.

⛔ Files ignored due to path filters (1)
  • lib/runtime/examples/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • components/router/src/main.rs (1 hunks)
  • deploy/metrics/prometheus.yml (1 hunks)
  • lib/llm/src/kv_router/publisher.rs (2 hunks)
  • lib/runtime/examples/system_metrics/Cargo.toml (2 hunks)
  • lib/runtime/examples/system_metrics/README.md (1 hunks)
  • lib/runtime/examples/system_metrics/src/bin/system_client.rs (2 hunks)
  • lib/runtime/examples/system_metrics/src/bin/system_server.rs (2 hunks)
  • lib/runtime/examples/system_metrics/src/lib.rs (1 hunks)
  • lib/runtime/src/distributed.rs (4 hunks)
  • lib/runtime/src/http_server.rs (1 hunks)
  • lib/runtime/src/lib.rs (3 hunks)
  • lib/runtime/src/metrics.rs (5 hunks)
  • lib/runtime/src/pipeline/network.rs (3 hunks)
  • lib/runtime/src/pipeline/network/ingress/push_handler.rs (7 hunks)
🧰 Additional context used
🧠 Learnings (10)
lib/runtime/src/lib.rs (4)

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

components/router/src/main.rs (2)

Learnt from: alec-flowers
PR: #1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions create_stored_blocks and create_stored_block_from_parts are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

lib/runtime/src/distributed.rs (5)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: PeaBrane
PR: #1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/runtime/examples/system_metrics/src/bin/system_server.rs (3)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

lib/runtime/examples/system_metrics/src/bin/system_client.rs (3)

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

lib/runtime/src/http_server.rs (1)

Learnt from: ryanolson
PR: #1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

lib/runtime/examples/system_metrics/Cargo.toml (1)

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/llm/src/kv_router/publisher.rs (7)

Learnt from: PeaBrane
PR: #1392
File: launch/dynamo-run/src/subprocess/vllm_v1_inc.py:71-71
Timestamp: 2025-06-05T01:04:24.775Z
Learning: The create_endpoint method in WorkerMetricsPublisher has backward compatibility maintained through pyo3 signature annotation #[pyo3(signature = (component, dp_rank = None))], making the dp_rank parameter optional with a default value of None.

Learnt from: alec-flowers
PR: #1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions create_stored_blocks and create_stored_block_from_parts are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Learnt from: PeaBrane
PR: #1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

Learnt from: oandreeva-nv
PR: #1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets None. The start_batching_publisher function in lib/llm/tests/block_manager.rs demonstrates this pattern: when the KVBMDynamoRuntimeComponent is dropped, its batch_tx sender is dropped, causing rx.recv() to return None, which triggers cleanup and task termination.

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: PeaBrane
PR: #1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

lib/runtime/examples/system_metrics/src/lib.rs (3)

Learnt from: ryanolson
PR: #1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from Send + Sync + Debug to Send + Debug because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.

Learnt from: grahamking
PR: #1962
File: lib/runtime/src/component/client.rs:270-273
Timestamp: 2025-07-16T12:41:12.543Z
Learning: In lib/runtime/src/component/client.rs, the current mutex usage in get_or_create_dynamic_instance_source is temporary while evaluating whether the mutex can be dropped entirely. The code currently has a race condition between try_lock and lock().await, but this is acknowledged as an interim state during the performance optimization process.

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

lib/runtime/src/pipeline/network/ingress/push_handler.rs (3)

Learnt from: kthui
PR: #1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:32:05.022Z
Learning: In async-nats, the "no responders" error is represented as async_nats::error::RequestErrorKind::NoResponders. Use err.downcast_ref::<async_nats::error::RequestError>() and then check req_err.kind() against RequestErrorKind::NoResponders to handle this error properly.

Learnt from: t-ob
PR: #1290
File: launch/dynamo-run/src/subprocess/sglang_inc.py:80-110
Timestamp: 2025-06-03T10:17:51.711Z
Learning: The sglang async_encode method does not support streaming options, so collecting all embeddings before yielding is the correct approach for embedding requests.

Learnt from: oandreeva-nv
PR: #1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets None. The start_batching_publisher function in lib/llm/tests/block_manager.rs demonstrates this pattern: when the KVBMDynamoRuntimeComponent is dropped, its batch_tx sender is dropped, causing rx.recv() to return None, which triggers cleanup and task termination.

🧬 Code Graph Analysis (4)
lib/runtime/examples/system_metrics/src/bin/system_server.rs (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (30-53)
lib/runtime/examples/system_metrics/src/lib.rs (1)
  • backend (95-128)
lib/runtime/examples/system_metrics/src/bin/system_client.rs (2)
lib/runtime/src/component.rs (5)
  • component (346-348)
  • component (489-495)
  • namespace (209-211)
  • namespace (498-505)
  • client (423-429)
lib/runtime/src/distributed.rs (1)
  • namespace (173-175)
lib/runtime/src/http_server.rs (1)
lib/runtime/src/distributed.rs (1)
  • new (46-141)
lib/runtime/src/pipeline/network.rs (5)
lib/runtime/src/distributed.rs (1)
  • new (46-141)
lib/runtime/examples/system_metrics/src/lib.rs (2)
  • from_endpoint (44-56)
  • with_metrics (65-69)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • Endpoint (212-236)
lib/runtime/src/pipeline/nodes/sinks/segment.rs (1)
  • attach (24-28)
lib/runtime/tests/common/mock.rs (1)
  • segment (305-309)
🔇 Additional comments (31)
lib/runtime/examples/system_metrics/src/bin/system_client.rs (2)

17-17: Good refactoring to use centralized constants.

The addition of DEFAULT_COMPONENT and DEFAULT_ENDPOINT to the imports improves maintainability by avoiding magic strings.


34-36: LGTM: Consistent use of configuration constants.

Replacing hardcoded strings with the imported constants (DEFAULT_COMPONENT and DEFAULT_ENDPOINT) improves maintainability and reduces the risk of typos.

lib/runtime/examples/system_metrics/Cargo.toml (2)

25-27: LGTM: Well-documented feature flags.

The feature separation allows for running tests without external dependencies, while the integration feature clearly indicates NATS requirement.


39-42: Appropriate dev dependencies for testing.

The selected dev dependencies (reqwest with JSON support, tokio-test, and rand) are well-suited for testing HTTP endpoints and async code.

lib/runtime/src/lib.rs (3)

23-23: LGTM: Adding OnceLock for lazy initialization.

The addition of OnceLock supports the new HTTP server info storage pattern.


40-40: Good API design: Re-exporting HttpServerInfo.

Making HttpServerInfo available at the crate root improves the public API ergonomics.


90-90: Consistent pattern: HTTP server field mirrors TCP server design.

The http_server field follows the same Arc<OnceLock<Arc<...>>> pattern as the existing tcp_server field, ensuring consistency in the codebase architecture.

components/router/src/main.rs (1)

69-82: Excellent integration of ingress metrics functionality.

The refactoring correctly integrates the new metrics capabilities by:

  1. Creating the endpoint first
  2. Using Ingress::for_engine_with_metrics instead of the basic for_engine method
  3. Passing the endpoint reference to enable automatic metrics collection

This change aligns perfectly with the PR objectives of adding generic ingress handler metrics.

lib/runtime/src/distributed.rs (4)

27-27: LGTM: Standard library import

The import of std::sync::OnceLock is correctly added to support the new HTTP server information storage functionality.


95-95: LGTM: Proper initialization of thread-safe storage

The field initialization using Arc::new(OnceLock::new()) follows the same pattern as the existing tcp_server field and provides thread-safe, single-initialization semantics.


119-130: LGTM: Correct HTTP server information storage

The server information storage logic properly:

  • Captures both the socket address and task handle from spawn_http_server
  • Creates an HttpServerInfo instance with the captured data
  • Uses OnceLock::set() with appropriate error handling for single initialization
  • Maintains consistency with the existing server management patterns

218-221: LGTM: Clean accessor method implementation

The http_server_info() method provides a clean API to access the stored HTTP server information, returning Option<Arc<HttpServerInfo>> which properly handles the case where the HTTP server may not be initialized.

lib/runtime/src/pipeline/network/ingress/push_handler.rs (4)

26-32: LGTM: Proper request metrics initialization

The metrics collection is correctly initialized at the start of request processing, capturing:

  • Request start time for duration calculation
  • Request counter increment
  • Concurrent request tracking
  • Request byte size measurement

The conditional check ensures metrics are only collected when available.


52-56: Excellent error categorization for observability

The error metrics implementation provides detailed categorization:

  • deserialization: JSON parsing failures
  • invalid_message: Unexpected message format
  • response_stream: TCP connection issues
  • generate: Engine generation failures

This granular labeling will enable precise monitoring and debugging of different failure modes.

Also applies to: 66-70, 89-93, 105-108


138-140: LGTM: Response metrics and error tracking

The response handling correctly tracks:

  • Response byte sizes for both regular responses and completion markers
  • Publish failures with appropriate error labels (publish_response, publish_final)

This provides complete visibility into the data flow and potential transmission issues.

Also applies to: 145-149, 160-162, 168-170


174-178: LGTM: Proper request completion metrics

The request completion logic correctly:

  • Records the total request duration using the captured start time
  • Decrements the concurrent request counter
  • Maintains proper cleanup regardless of success/failure

This ensures accurate duration tracking and prevents metric leaks.

lib/runtime/src/http_server.rs (2)

22-22: LGTM: Required import for task handle

The import of tokio::task::JoinHandle is correctly added to support the optional task handle field in HttpServerInfo.


26-61: Well-designed HTTP server information struct

The HttpServerInfo implementation is clean and comprehensive:

Strengths:

  • Encapsulates both socket address and optional task handle
  • Constructor properly wraps JoinHandle in Arc for shared ownership
  • Provides convenient accessor methods for address components
  • Manual Clone implementation correctly handles the Arc<JoinHandle<()>> field

Design considerations:

  • The optional handle allows flexibility for different server management scenarios
  • The struct provides a clean abstraction for HTTP server runtime information
  • Consistent with the broader server management patterns in the codebase
lib/llm/src/kv_router/publisher.rs (3)

502-504: LGTM: Efficient endpoint reuse and metrics integration

The implementation correctly:

  • Obtains the endpoint once and reuses it for both handler creation and ingress setup
  • Integrates with the new metrics system using Ingress::for_engine_with_metrics
  • Passes the endpoint to both the handler constructor and the ingress factory

This is more efficient than the previous approach of calling component.endpoint() multiple times.


506-506: LGTM: Consistent endpoint usage

The endpoint builder correctly uses the pre-obtained endpoint, maintaining consistency with the metrics-enabled ingress handler.


520-521: LGTM: Future-proofed handler design

The addition of the endpoint field to KvLoadEndpoingHander is well-designed:

  • The #[allow(dead_code)] annotation acknowledges the field isn't used yet
  • The constructor properly accepts and stores the endpoint
  • This prepares the handler for potential future endpoint-based functionality

Also applies to: 525-530

lib/runtime/examples/system_metrics/src/bin/system_server.rs (2)

16-17: LGTM: Simplified imports reflect refactored architecture

The imports are now minimal and focused:

  • Standard runtime components are imported directly
  • The backend function is imported from the system_metrics crate library
  • Removes all the custom metrics and handler imports that were moved to the library

27-27: Excellent demonstration of the simplified API

The single call to backend(distributed, None) beautifully demonstrates the new generic profiling system:

  • No manual metrics setup required
  • No custom handler implementation needed
  • The complexity is properly abstracted away in the library
  • Shows how easy it is to add comprehensive metrics to an ingress handler

This exemplifies the PR's goal of providing "automatic profiling capabilities" for ingress handlers.

lib/runtime/examples/system_metrics/README.md (1)

1-147: Excellent documentation for the new ingress metrics system!

The README provides comprehensive coverage of the new IngressMetrics feature with clear examples, benefits, and usage instructions. The structure is logical and the examples effectively demonstrate both automatic profiling and custom metrics integration.

lib/runtime/src/pipeline/network.rs (2)

48-128: Well-structured metrics implementation with comprehensive coverage!

The IngressMetrics struct provides a clean abstraction for ingress profiling with all essential metrics (requests, duration, concurrency, bytes, errors). The from_endpoint factory method elegantly leverages the endpoint's built-in labeling system.


425-437: Clean and intuitive API for metrics-enabled ingress creation!

The for_engine_with_metrics method provides an excellent developer experience by combining engine setup with automatic metrics configuration in a single call. The helper metrics() accessor is also well-designed for optional metrics access.

lib/runtime/src/metrics.rs (3)

123-136: Consistent implementation of IntCounterVec support!

The PrometheusMetric trait implementation for IntCounterVec correctly follows the established pattern used by other vector metric types.


285-299: Proper integration of IntCounterVec in the metric creation flow!

The handling of IntCounterVec in create_metric correctly validates parameters and follows the established pattern for vector metric types.


402-417: Breaking change: create_countervec signature updated

The create_countervec method now requires two separate parameters—

  • const_labels: &[&str]
  • const_label_values: &[(&str, &str)]

This improves consistency with other vector metrics but is a breaking API change for any external callers. Our search found only the definition and an internal example in lib/runtime/src/metrics.rs, so there are no remaining internal call sites to update:

  • Definition: lib/runtime/src/metrics.rs (lines 402–417)
  • Internal example call: lib/runtime/src/metrics.rs (example in docs/tests)

Please review and update any external consumers of this crate to match the new signature.

lib/runtime/examples/system_metrics/src/lib.rs (2)

37-57: Well-designed custom metrics struct!

The MySystemStatsMetrics implementation effectively demonstrates how to create custom metrics using the endpoint's built-in metric registry. The use of Arc<IntCounter> ensures thread-safe access.


72-91: Excellent example of integrating custom metrics into a handler!

The RequestHandler implementation clearly demonstrates how to incorporate custom metrics tracking alongside the automatic ingress metrics. The optional metrics pattern allows the handler to work with or without metrics.

@keivenchang keivenchang force-pushed the keivenkchang/Observability_DIS-299__base-metrics-add-generic-requesthandler-metrics branch from ff666f5 to d9d9420 Compare July 24, 2025 15:40
@tedzhouhk
Copy link
Contributor

great work! hope the examples can have it soon!

whoisj
whoisj previously requested changes Jul 24, 2025
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at a high level - would benefit from a more detailed review on the code changes. CC @paulhendricks @ryanolson @kthui @oandreeva-nv

@keivenchang keivenchang force-pushed the keivenkchang/Observability_DIS-299__base-metrics-add-generic-requesthandler-metrics branch from 23909fb to 3b3b874 Compare July 26, 2025 01:44
Copy link
Contributor

@ryanolson ryanolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall - great addition! this will be really useful. thank you!

let's continue to discuss the default naming conventions and how users can select different variants.

it would be a good place to really start solidifying our configuration details using figment to parse, merge and join files/envs/cli-args, etc.

Copy link
Collaborator

@whoisj whoisj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think some test code survived and should be removed before this is merged.

@keivenchang keivenchang requested a review from whoisj July 28, 2025 17:20
@keivenchang keivenchang merged commit 615580d into main Jul 28, 2025
10 checks passed
@keivenchang keivenchang deleted the keivenkchang/Observability_DIS-299__base-metrics-add-generic-requesthandler-metrics branch July 28, 2025 17:35
@biswapanda
Copy link
Contributor

Lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants